02. Analysis Seoul Crime¶

1. 프로젝트 개요¶

2. 데이터 개요¶

In [234]:

import numpy as np
import pandas as pd
import openpyxl

In [235]:

# 데이터 읽기
crime_raw_data = pd.read_csv("../data/02. crime_in_Seoul.csv", thousands = ",", encoding="euc-kr")
#thousands - 숫자값을 문자로 인식할 수 있어서 설정
crime_raw_data.head()

Out[235]:

	구분	죄종	발생검거	건수
0	중부	살인	발생	2.0
1	중부	살인	검거	2.0
2	중부	강도	발생	3.0
3	중부	강도	검거	3.0
4	중부	강간	발생	141.0

In [236]:

crime_raw_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 65534 entries, 0 to 65533
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   구분      310 non-null    object 
 1   죄종      310 non-null    object 
 2   발생검거    310 non-null    object 
 3   건수      310 non-null    float64
dtypes: float64(1), object(3)
memory usage: 2.0+ MB

info(): 데이터 개요 확인
RangeIndex가 65534인데, 310개이다

In [237]:

crime_raw_data["죄종"].unique()

Out[237]:

array(['살인', '강도', '강간', '절도', '폭력', nan], dtype=object)

특정 컬럼에서 unique 조사: nan 값이 들어가 있는 것 확인

In [238]:

crime_raw_data[crime_raw_data["죄종"].isnull()]

Out[238]:

	구분	죄종	발생검거	건수
310	NaN	NaN	NaN	NaN
311	NaN	NaN	NaN	NaN
312	NaN	NaN	NaN	NaN
313	NaN	NaN	NaN	NaN
314	NaN	NaN	NaN	NaN
...	...	...	...	...
65529	NaN	NaN	NaN	NaN
65530	NaN	NaN	NaN	NaN
65531	NaN	NaN	NaN	NaN
65532	NaN	NaN	NaN	NaN
65533	NaN	NaN	NaN	NaN

65224 rows × 4 columns

In [239]:

crime_raw_data = crime_raw_data[crime_raw_data["죄종"].notnull()]

In [240]:

crime_raw_data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 310 entries, 0 to 309
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   구분      310 non-null    object 
 1   죄종      310 non-null    object 
 2   발생검거    310 non-null    object 
 3   건수      310 non-null    float64
dtypes: float64(1), object(3)
memory usage: 12.1+ KB

Pandas pivot table¶

index, columns, values, aggfunc

In [241]:

df = pd.read_excel("../data/02. sales-funnel.xlsx")
df.head()

Out[241]:

	Account	Name	Rep	Manager	Product	Quantity	Price	Status
0	714466	Trantow-Barrows	Craig Booker	Debra Henley	CPU	1	30000	presented
1	714466	Trantow-Barrows	Craig Booker	Debra Henley	Software	1	10000	presented
2	714466	Trantow-Barrows	Craig Booker	Debra Henley	Maintenance	2	5000	pending
3	737550	Fritsch, Russel and Anderson	Craig Booker	Debra Henley	CPU	1	35000	declined
4	146832	Kiehn-Spinka	Daniel Hilton	Debra Henley	CPU	2	65000	won

In [242]:

# Name 컬럼을 인덱스로 설정
pd.pivot_table(df, index="Name")
df.pivot_table(index="Name")

C:\Users\admin\AppData\Local\Temp\ipykernel_9092\964206776.py:2: FutureWarning: pivot_table dropped a column because it failed to aggregate. This behavior is deprecated and will raise in a future version of pandas. Select only the columns that can be aggregated.
  pd.pivot_table(df, index="Name")
C:\Users\admin\AppData\Local\Temp\ipykernel_9092\964206776.py:3: FutureWarning: pivot_table dropped a column because it failed to aggregate. This behavior is deprecated and will raise in a future version of pandas. Select only the columns that can be aggregated.
  df.pivot_table(index="Name")

Out[242]:

	Account	Price	Quantity
Name
Barton LLC	740150	35000	1.000000
Fritsch, Russel and Anderson	737550	35000	1.000000
Herman LLC	141962	65000	2.000000
Jerde-Hilpert	412290	5000	2.000000
Kassulke, Ondricka and Metz	307599	7000	3.000000
Keeling LLC	688981	100000	5.000000
Kiehn-Spinka	146832	65000	2.000000
Koepp Ltd	729833	35000	2.000000
Kulas Inc	218895	25000	1.500000
Purdy-Kunde	163416	30000	1.000000
Stokes LLC	239344	7500	1.000000
Trantow-Barrows	714466	15000	1.333333

In [243]:

#멀티 인덱스 설정
df.pivot_table(index=["Name","Rep","Manager"])

C:\Users\admin\AppData\Local\Temp\ipykernel_9092\3928218144.py:2: FutureWarning: pivot_table dropped a column because it failed to aggregate. This behavior is deprecated and will raise in a future version of pandas. Select only the columns that can be aggregated.
  df.pivot_table(index=["Name","Rep","Manager"])

Out[243]:

			Account	Price	Quantity
Name	Rep	Manager
Barton LLC	John Smith	Debra Henley	740150	35000	1.000000
Fritsch, Russel and Anderson	Craig Booker	Debra Henley	737550	35000	1.000000
Herman LLC	Cedric Moss	Fred Anderson	141962	65000	2.000000
Jerde-Hilpert	John Smith	Debra Henley	412290	5000	2.000000
Kassulke, Ondricka and Metz	Wendy Yule	Fred Anderson	307599	7000	3.000000
Keeling LLC	Wendy Yule	Fred Anderson	688981	100000	5.000000
Kiehn-Spinka	Daniel Hilton	Debra Henley	146832	65000	2.000000
Koepp Ltd	Wendy Yule	Fred Anderson	729833	35000	2.000000
Kulas Inc	Daniel Hilton	Debra Henley	218895	25000	1.500000
Purdy-Kunde	Cedric Moss	Fred Anderson	163416	30000	1.000000
Stokes LLC	Cedric Moss	Fred Anderson	239344	7500	1.000000
Trantow-Barrows	Craig Booker	Debra Henley	714466	15000	1.333333

values 설정¶

In [244]:

df.head()

Out[244]:

	Account	Name	Rep	Manager	Product	Quantity	Price	Status
0	714466	Trantow-Barrows	Craig Booker	Debra Henley	CPU	1	30000	presented
1	714466	Trantow-Barrows	Craig Booker	Debra Henley	Software	1	10000	presented
2	714466	Trantow-Barrows	Craig Booker	Debra Henley	Maintenance	2	5000	pending
3	737550	Fritsch, Russel and Anderson	Craig Booker	Debra Henley	CPU	1	35000	declined
4	146832	Kiehn-Spinka	Daniel Hilton	Debra Henley	CPU	2	65000	won

In [245]:

df.pivot_table(index=["Manager","Rep"], values="Price")

Out[245]:

		Price
Manager	Rep
Debra Henley	Craig Booker	20000.000000
	Daniel Hilton	38333.333333
	John Smith	20000.000000
Fred Anderson	Cedric Moss	27500.000000
Fred Anderson	Wendy Yule	44250.000000

In [246]:

#price 칼럼에 sum연산 적용
df.pivot_table(index=["Manager","Rep"], values="Price",aggfunc=np.sum)

Out[246]:

		Price
Manager	Rep
Debra Henley	Craig Booker	80000
	Daniel Hilton	115000
	John Smith	40000
Fred Anderson	Cedric Moss	110000
Fred Anderson	Wendy Yule	177000

In [247]:

df.pivot_table(index=["Manager","Rep"], values="Price",aggfunc=[np.sum,len])

Out[247]:

		sum	len
		Price	Price
Manager	Rep
Debra Henley	Craig Booker	80000	4
	Daniel Hilton	115000	3
	John Smith	40000	2
Fred Anderson	Cedric Moss	110000	4
Fred Anderson	Wendy Yule	177000	4

columns 설정¶

In [248]:

df.head()

Out[248]:

	Account	Name	Rep	Manager	Product	Quantity	Price	Status
0	714466	Trantow-Barrows	Craig Booker	Debra Henley	CPU	1	30000	presented
1	714466	Trantow-Barrows	Craig Booker	Debra Henley	Software	1	10000	presented
2	714466	Trantow-Barrows	Craig Booker	Debra Henley	Maintenance	2	5000	pending
3	737550	Fritsch, Russel and Anderson	Craig Booker	Debra Henley	CPU	1	35000	declined
4	146832	Kiehn-Spinka	Daniel Hilton	Debra Henley	CPU	2	65000	won

In [249]:

# product를 컬럼으로 지정
df.pivot_table(index=["Manager","Rep"], values="Price",columns="Product",aggfunc=[np.sum,len])

Out[249]:

		sum				len
	Product	CPU	Maintenance	Monitor	Software	CPU	Maintenance	Monitor	Software
Manager	Rep
Debra Henley	Craig Booker	65000.0	5000.0	NaN	10000.0	2.0	1.0	NaN	1.0
	Daniel Hilton	105000.0	NaN	NaN	10000.0	2.0	NaN	NaN	1.0
	John Smith	35000.0	5000.0	NaN	NaN	1.0	1.0	NaN	NaN
Fred Anderson	Cedric Moss	95000.0	5000.0	NaN	10000.0	2.0	1.0	NaN	1.0
Fred Anderson	Wendy Yule	165000.0	7000.0	5000.0	NaN	2.0	1.0	1.0	NaN

In [250]:

# Nan값 설정: fill_value
df.pivot_table(index=["Manager","Rep"], values="Price",columns="Product",aggfunc=[np.sum,len],fill_value=0)

Out[250]:

		sum				len
	Product	CPU	Maintenance	Monitor	Software	CPU	Maintenance	Monitor	Software
Manager	Rep
Debra Henley	Craig Booker	65000	5000	0	10000	2	1	0	1
	Daniel Hilton	105000	0	0	10000	2	0	0	1
	John Smith	35000	5000	0	0	1	1	0	0
Fred Anderson	Cedric Moss	95000	5000	0	10000	2	1	0	1
Fred Anderson	Wendy Yule	165000	7000	5000	0	2	1	1	0

In [251]:

#2개 이상 index, values 설정
df.pivot_table(index=["Manager","Rep","Product"],values=["Price","Quantity"],aggfunc=np.sum,fill_value=0)

Out[251]:

			Price	Quantity
Manager	Rep	Product
Debra Henley	Craig Booker	CPU	65000	2
		Maintenance	5000	2
		Software	10000	1
	Daniel Hilton	CPU	105000	4
	Daniel Hilton	Software	10000	1
	John Smith	CPU	35000	1
	John Smith	Maintenance	5000	2
Fred Anderson	Cedric Moss	CPU	95000	3
		Maintenance	5000	1
		Software	10000	1
	Wendy Yule	CPU	165000	7
		Maintenance	7000	3
		Monitor	5000	2

In [252]:

#aggfunc 2개 이상 설정
df.pivot_table(
    index=["Manager","Rep"],
    values=["Price","Quantity"],
    columns="Product",
    aggfunc=[np.sum,np.mean],
    fill_value=0,
    margins = True)  #총계(All) 추가

Out[252]:

		sum										mean
		Price					Quantity					Price					Quantity
	Product	CPU	Maintenance	Monitor	Software	All	CPU	Maintenance	Monitor	Software	All	CPU	Maintenance	Monitor	Software	All	CPU	Maintenance	Monitor	Software	All
Manager	Rep
Debra Henley	Craig Booker	65000	5000	0	10000	80000	2	2	0	1	5	32500.000000	5000	0	10000	20000.000000	1.000000	2	0	1	1.250000
	Daniel Hilton	105000	0	0	10000	115000	4	0	0	1	5	52500.000000	0	0	10000	38333.333333	2.000000	0	0	1	1.666667
	John Smith	35000	5000	0	0	40000	1	2	0	0	3	35000.000000	5000	0	0	20000.000000	1.000000	2	0	0	1.500000
Fred Anderson	Cedric Moss	95000	5000	0	10000	110000	3	1	0	1	5	47500.000000	5000	0	10000	27500.000000	1.500000	1	0	1	1.250000
Fred Anderson	Wendy Yule	165000	7000	5000	0	177000	7	3	2	0	12	82500.000000	7000	5000	0	44250.000000	3.500000	3	2	0	3.000000
All		465000	22000	5000	30000	522000	17	8	2	3	30	51666.666667	5500	5000	10000	30705.882353	1.888889	2	2	1	1.764706

3. 서울시 범죄 현황 데이터 정리¶

In [253]:

crime_raw_data.head()

Out[253]:

	구분	죄종	발생검거	건수
0	중부	살인	발생	2.0
1	중부	살인	검거	2.0
2	중부	강도	발생	3.0
3	중부	강도	검거	3.0
4	중부	강간	발생	141.0

In [254]:

crime_station = crime_raw_data.pivot_table(index="구분", columns = ["죄종","발생검거"],aggfunc=[np.sum])
crime_station.head()

Out[254]:

	sum
	건수
죄종	강간		강도		살인		절도		폭력
발생검거	검거	발생	검거	발생	검거	발생	검거	발생	검거	발생
구분
강남	269.0	339.0	26.0	24.0	3.0	3.0	1129.0	2438.0	2096.0	2336.0
강동	152.0	160.0	13.0	14.0	5.0	4.0	902.0	1754.0	2201.0	2530.0
강북	159.0	217.0	4.0	5.0	6.0	7.0	672.0	1222.0	2482.0	2778.0
강서	239.0	275.0	10.0	10.0	10.0	9.0	1070.0	1952.0	2768.0	3204.0
관악	264.0	322.0	10.0	12.0	7.0	6.0	937.0	2103.0	2707.0	3235.0

In [255]:

crime_station.columns  #Multi index

Out[255]:

MultiIndex([('sum', '건수', '강간', '검거'),
            ('sum', '건수', '강간', '발생'),
            ('sum', '건수', '강도', '검거'),
            ('sum', '건수', '강도', '발생'),
            ('sum', '건수', '살인', '검거'),
            ('sum', '건수', '살인', '발생'),
            ('sum', '건수', '절도', '검거'),
            ('sum', '건수', '절도', '발생'),
            ('sum', '건수', '폭력', '검거'),
            ('sum', '건수', '폭력', '발생')],
           names=[None, None, '죄종', '발생검거'])

In [256]:

crime_station["sum","건수","강도","검거"][:5]

Out[256]:

구분
강남    26.0
강동    13.0
강북     4.0
강서    10.0
관악    10.0
Name: (sum, 건수, 강도, 검거), dtype: float64

In [257]:

crime_station.columns = crime_station.columns.droplevel([0,1]) #다중 컬럼에서 특정 컬럼 제거
crime_station.columns

Out[257]:

MultiIndex([('강간', '검거'),
            ('강간', '발생'),
            ('강도', '검거'),
            ('강도', '발생'),
            ('살인', '검거'),
            ('살인', '발생'),
            ('절도', '검거'),
            ('절도', '발생'),
            ('폭력', '검거'),
            ('폭력', '발생')],
           names=['죄종', '발생검거'])

In [258]:

crime_station.head()

Out[258]:

죄종	강간		강도		살인		절도		폭력
발생검거	검거	발생	검거	발생	검거	발생	검거	발생	검거	발생
구분
강남	269.0	339.0	26.0	24.0	3.0	3.0	1129.0	2438.0	2096.0	2336.0
강동	152.0	160.0	13.0	14.0	5.0	4.0	902.0	1754.0	2201.0	2530.0
강북	159.0	217.0	4.0	5.0	6.0	7.0	672.0	1222.0	2482.0	2778.0
강서	239.0	275.0	10.0	10.0	10.0	9.0	1070.0	1952.0	2768.0	3204.0
관악	264.0	322.0	10.0	12.0	7.0	6.0	937.0	2103.0	2707.0	3235.0

4. Python 모듈 설치¶

pip 명령¶

python의 고식 모듈 관리자
pip list
pip install module_name
pip uninstall module_name

In [259]:

!pip list
get_ipython().system("pip list")

Package              Version
-------------------- -----------
anyio                3.5.0
argon2-cffi          21.3.0
argon2-cffi-bindings 21.2.0
asttokens            2.0.5
attrs                22.1.0
Babel                2.11.0
backcall             0.2.0
beautifulsoup4       4.11.1
bleach               4.1.0
Bottleneck           1.3.5
branca               0.6.0
brotlipy             0.7.0
certifi              2022.12.7
cffi                 1.15.1
charset-normalizer   2.0.4
colorama             0.4.6
comm                 0.1.2
contourpy            1.0.5
cryptography         38.0.4
cycler               0.11.0
debugpy              1.5.1
decorator            5.1.1
defusedxml           0.7.1
entrypoints          0.4
et-xmlfile           1.1.0
executing            0.8.3
fastjsonschema       2.16.2
flit_core            3.6.0
folium               0.14.0
fonttools            4.25.0
googlemaps           2.5.1
idna                 3.4
importlib-metadata   4.11.3
importlib-resources  5.2.0
ipykernel            6.19.2
ipython              8.10.0
ipython-genutils     0.2.0
ipywidgets           7.6.5
jedi                 0.18.1
Jinja2               3.1.2
json5                0.9.6
jsonschema           4.17.3
jupyter              1.0.0
jupyter_client       7.4.9
jupyter-console      6.4.4
jupyter_core         5.2.0
jupyter-server       1.23.4
jupyterlab           3.5.3
jupyterlab-pygments  0.1.2
jupyterlab_server    2.19.0
jupyterlab-widgets   1.0.0
kiwisolver           1.4.4
lxml                 4.9.1
MarkupSafe           2.1.1
matplotlib           3.6.2
matplotlib-inline    0.1.6
mistune              0.8.4
mkl-fft              1.3.1
mkl-random           1.2.2
mkl-service          2.4.0
munkres              1.1.4
nbclassic            0.5.2
nbclient             0.5.13
nbconvert            6.5.4
nbformat             5.7.0
nest-asyncio         1.5.6
notebook             6.5.2
notebook_shim        0.2.2
numexpr              2.8.4
numpy                1.23.5
openpyxl             3.1.1
packaging            22.0
pandas               1.5.2
pandocfilters        1.5.0
parso                0.8.3
patsy                0.5.3
pickleshare          0.7.5
Pillow               9.3.0
pip                  22.3.1
pkgutil_resolve_name 1.3.10
platformdirs         2.5.2
ply                  3.11
prometheus-client    0.14.1
prompt-toolkit       3.0.36
psutil               5.9.0
pure-eval            0.2.2
pycparser            2.21
Pygments             2.11.2
pyOpenSSL            22.0.0
pyparsing            3.0.9
PyQt5                5.15.7
PyQt5-sip            12.11.0
pyrsistent           0.18.0
PySocks              1.7.1
python-dateutil      2.8.2
pytz                 2022.7
pywin32              305.1
pywinpty             2.0.2
pyzmq                23.2.0
qtconsole            5.4.0
QtPy                 2.2.0
requests             2.28.1
scipy                1.10.1
seaborn              0.12.2
Send2Trash           1.8.0
setuptools           65.6.3
sip                  6.6.2
six                  1.16.0
sniffio              1.2.0
soupsieve            2.3.2.post1
stack-data           0.2.0
statsmodels          0.13.5
terminado            0.17.1
tinycss2             1.2.1
toml                 0.10.2
tomli                2.0.1
tornado              6.2
traitlets            5.7.1
typing_extensions    4.4.0
urllib3              1.26.14
wcwidth              0.2.5
webencodings         0.5.1
websocket-client     0.58.0
wheel                0.38.4
widgetsnbextension   3.5.2
win-inet-pton        1.1.0
wincertstore         0.2
xlrd                 2.0.1
zipp                 3.11.0
Package              Version
-------------------- -----------
anyio                3.5.0
argon2-cffi          21.3.0
argon2-cffi-bindings 21.2.0
asttokens            2.0.5
attrs                22.1.0
Babel                2.11.0
backcall             0.2.0
beautifulsoup4       4.11.1
bleach               4.1.0
Bottleneck           1.3.5
branca               0.6.0
brotlipy             0.7.0
certifi              2022.12.7
cffi                 1.15.1
charset-normalizer   2.0.4
colorama             0.4.6
comm                 0.1.2
contourpy            1.0.5
cryptography         38.0.4
cycler               0.11.0
debugpy              1.5.1
decorator            5.1.1
defusedxml           0.7.1
entrypoints          0.4
et-xmlfile           1.1.0
executing            0.8.3
fastjsonschema       2.16.2
flit_core            3.6.0
folium               0.14.0
fonttools            4.25.0
googlemaps           2.5.1
idna                 3.4
importlib-metadata   4.11.3
importlib-resources  5.2.0
ipykernel            6.19.2
ipython              8.10.0
ipython-genutils     0.2.0
ipywidgets           7.6.5
jedi                 0.18.1
Jinja2               3.1.2
json5                0.9.6
jsonschema           4.17.3
jupyter              1.0.0
jupyter_client       7.4.9
jupyter-console      6.4.4
jupyter_core         5.2.0
jupyter-server       1.23.4
jupyterlab           3.5.3
jupyterlab-pygments  0.1.2
jupyterlab_server    2.19.0
jupyterlab-widgets   1.0.0
kiwisolver           1.4.4
lxml                 4.9.1
MarkupSafe           2.1.1
matplotlib           3.6.2
matplotlib-inline    0.1.6
mistune              0.8.4
mkl-fft              1.3.1
mkl-random           1.2.2
mkl-service          2.4.0
munkres              1.1.4
nbclassic            0.5.2
nbclient             0.5.13
nbconvert            6.5.4
nbformat             5.7.0
nest-asyncio         1.5.6
notebook             6.5.2
notebook_shim        0.2.2
numexpr              2.8.4
numpy                1.23.5
openpyxl             3.1.1
packaging            22.0
pandas               1.5.2
pandocfilters        1.5.0
parso                0.8.3
patsy                0.5.3
pickleshare          0.7.5
Pillow               9.3.0
pip                  22.3.1
pkgutil_resolve_name 1.3.10
platformdirs         2.5.2
ply                  3.11
prometheus-client    0.14.1
prompt-toolkit       3.0.36
psutil               5.9.0
pure-eval            0.2.2
pycparser            2.21
Pygments             2.11.2
pyOpenSSL            22.0.0
pyparsing            3.0.9
PyQt5                5.15.7
PyQt5-sip            12.11.0
pyrsistent           0.18.0
PySocks              1.7.1
python-dateutil      2.8.2
pytz                 2022.7
pywin32              305.1
pywinpty             2.0.2
pyzmq                23.2.0
qtconsole            5.4.0
QtPy                 2.2.0
requests             2.28.1
scipy                1.10.1
seaborn              0.12.2
Send2Trash           1.8.0
setuptools           65.6.3
sip                  6.6.2
six                  1.16.0
sniffio              1.2.0
soupsieve            2.3.2.post1
stack-data           0.2.0
statsmodels          0.13.5
terminado            0.17.1
tinycss2             1.2.1
toml                 0.10.2
tomli                2.0.1
tornado              6.2
traitlets            5.7.1
typing_extensions    4.4.0
urllib3              1.26.14
wcwidth              0.2.5
webencodings         0.5.1
websocket-client     0.58.0
wheel                0.38.4
widgetsnbextension   3.5.2
win-inet-pton        1.1.0
wincertstore         0.2
xlrd                 2.0.1
zipp                 3.11.0

conda 명령¶

condalist
conda install module_name
conda uninstall module_name
conda install -c channel_name module_name
- 지정된 배포 채널에서 모듈 설치

5. Google Maps API 설치¶

In [260]:

# 구글 계정
# AIzaSyBRwnhXCj6U8hJkACJjk3CdL1aLzh_Knso

conda install -c conda-forge googlemaps

In [261]:

import googlemaps

In [262]:

gmaps_key = "AIzaSyBRwnhXCj6U8hJkACJjk3CdL1aLzh_Knso"
gmaps= googlemaps.Client(key=gmaps_key)

In [263]:

gmaps.geocode("서울영등포경찰서",language="ko")

Out[263]:

[{'address_components': [{'long_name': '608',
    'short_name': '608',
    'types': ['premise']},
   {'long_name': '국회대로',
    'short_name': '국회대로',
    'types': ['political', 'sublocality', 'sublocality_level_4']},
   {'long_name': '영등포구',
    'short_name': '영등포구',
    'types': ['political', 'sublocality', 'sublocality_level_1']},
   {'long_name': '서울특별시',
    'short_name': '서울특별시',
    'types': ['administrative_area_level_1', 'political']},
   {'long_name': '대한민국',
    'short_name': 'KR',
    'types': ['country', 'political']},
   {'long_name': '150-043',
    'short_name': '150-043',
    'types': ['postal_code']}],
  'formatted_address': '대한민국 서울특별시 영등포구 국회대로 608',
  'geometry': {'location': {'lat': 37.5260441, 'lng': 126.9008091},
   'location_type': 'ROOFTOP',
   'viewport': {'northeast': {'lat': 37.5273930802915,
     'lng': 126.9021580802915},
    'southwest': {'lat': 37.5246951197085, 'lng': 126.8994601197085}}},
  'partial_match': True,
  'place_id': 'ChIJ1TimJLaffDURptXOs0Tj6sY',
  'plus_code': {'compound_code': 'GWG2+C8 대한민국 서울특별시',
   'global_code': '8Q98GWG2+C8'},
  'types': ['establishment', 'point_of_interest', 'police']}]

Python 반복문¶

간단한 for문 예제¶

In [264]:

for n in [1,2,3,4]:
    print("Number is", n)

Number is 1
Number is 2
Number is 3
Number is 4

조금 복잡한 for문 예제¶

In [265]:

for n in range(0,10):
    print(n ** 2)

위 코드를 한 줄로: list comprehension¶

In [266]:

[n ** 2 for n in range(0, 10)]

Out[266]:

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Pandas에 잘 맞춰진 반복문용 명령 iterrows()¶

Pandas 데이터 프레임은 대부분 2차원
이럴 때 for문을 사용하면, n번째라는 지정을 반복해서 가독률 떨어짐
Pandas 데이터 프레임으로 반복문 만들 때 iterrows() 옵션 사용하면 편함
받을 때, 인덱스와 내용으로 나누어 받는 것만 주의

6. Google Maps를 이용한 데이터 정리¶

In [267]:

gmaps.geocode("서울영등포경찰서",language="ko")

Out[267]:

[{'address_components': [{'long_name': '608',
    'short_name': '608',
    'types': ['premise']},
   {'long_name': '국회대로',
    'short_name': '국회대로',
    'types': ['political', 'sublocality', 'sublocality_level_4']},
   {'long_name': '영등포구',
    'short_name': '영등포구',
    'types': ['political', 'sublocality', 'sublocality_level_1']},
   {'long_name': '서울특별시',
    'short_name': '서울특별시',
    'types': ['administrative_area_level_1', 'political']},
   {'long_name': '대한민국',
    'short_name': 'KR',
    'types': ['country', 'political']},
   {'long_name': '150-043',
    'short_name': '150-043',
    'types': ['postal_code']}],
  'formatted_address': '대한민국 서울특별시 영등포구 국회대로 608',
  'geometry': {'location': {'lat': 37.5260441, 'lng': 126.9008091},
   'location_type': 'ROOFTOP',
   'viewport': {'northeast': {'lat': 37.5273930802915,
     'lng': 126.9021580802915},
    'southwest': {'lat': 37.5246951197085, 'lng': 126.8994601197085}}},
  'partial_match': True,
  'place_id': 'ChIJ1TimJLaffDURptXOs0Tj6sY',
  'plus_code': {'compound_code': 'GWG2+C8 대한민국 서울특별시',
   'global_code': '8Q98GWG2+C8'},
  'types': ['establishment', 'point_of_interest', 'police']}]

In [268]:

tmp = gmaps.geocode("서울영등포경찰서",language="ko")

In [269]:

tmp[0].get("geometry")["location"]

Out[269]:

{'lat': 37.5260441, 'lng': 126.9008091}

In [270]:

print(tmp[0].get("geometry")["location"]["lat"])
print(tmp[0].get("geometry")["location"]["lng"])

37.5260441
126.9008091

In [271]:

tmp[0].get("formatted_address")

Out[271]:

'대한민국 서울특별시 영등포구 국회대로 608'

In [272]:

tmp[0].get("formatted_address").split()[2]

Out[272]:

'영등포구'

In [273]:

crime_station.head()

Out[273]:

죄종	강간		강도		살인		절도		폭력
발생검거	검거	발생	검거	발생	검거	발생	검거	발생	검거	발생
구분
강남	269.0	339.0	26.0	24.0	3.0	3.0	1129.0	2438.0	2096.0	2336.0
강동	152.0	160.0	13.0	14.0	5.0	4.0	902.0	1754.0	2201.0	2530.0
강북	159.0	217.0	4.0	5.0	6.0	7.0	672.0	1222.0	2482.0	2778.0
강서	239.0	275.0	10.0	10.0	10.0	9.0	1070.0	1952.0	2768.0	3204.0
관악	264.0	322.0	10.0	12.0	7.0	6.0	937.0	2103.0	2707.0	3235.0

In [274]:

crime_station["구별"] = np.nan
crime_station["lat"] = np.nan
crime_station["lng"] = np.nan

In [275]:

crime_station.head()

Out[275]:

죄종	강간		강도		살인		절도		폭력		구별	lat	lng
발생검거	검거	발생	검거	발생	검거	발생	검거	발생	검거	발생
구분
강남	269.0	339.0	26.0	24.0	3.0	3.0	1129.0	2438.0	2096.0	2336.0	NaN	NaN	NaN
강동	152.0	160.0	13.0	14.0	5.0	4.0	902.0	1754.0	2201.0	2530.0	NaN	NaN	NaN
강북	159.0	217.0	4.0	5.0	6.0	7.0	672.0	1222.0	2482.0	2778.0	NaN	NaN	NaN
강서	239.0	275.0	10.0	10.0	10.0	9.0	1070.0	1952.0	2768.0	3204.0	NaN	NaN	NaN
관악	264.0	322.0	10.0	12.0	7.0	6.0	937.0	2103.0	2707.0	3235.0	NaN	NaN	NaN

경찰서 이름에서 소속된 구이름 얻기
구이름과 위도 경도 정보를 저장할 준비
반복문을 이용해서 위 표의 Nan을 모두 채워준다
iterrows()

In [276]:

count = 0

for idx, rows in crime_station.iterrows():
    station_name = "서울" + str(idx) + "경찰서"
    tmp = gmaps.geocode(station_name, language = "ko")
    
    tmp_gu = tmp[0].get("formatted_address").split()[2]
    lat = tmp[0].get("geometry")["location"]["lat"]
    lng = tmp[0].get("geometry")["location"]["lng"]
    
    crime_station.loc[idx,"lat"] = lat
    crime_station.loc[idx,"lng"] = lng
    crime_station.loc[idx,"구별"] = tmp_gu
    
    print(count)   #제대로 작동하는지 확인
    count += 1

In [277]:

crime_station.head()

Out[277]:

죄종	강간		강도		살인		절도		폭력		구별	lat	lng
발생검거	검거	발생	검거	발생	검거	발생	검거	발생	검거	발생
구분
강남	269.0	339.0	26.0	24.0	3.0	3.0	1129.0	2438.0	2096.0	2336.0	강남구	37.509435	127.066958
강동	152.0	160.0	13.0	14.0	5.0	4.0	902.0	1754.0	2201.0	2530.0	강동구	37.528511	127.126822
강북	159.0	217.0	4.0	5.0	6.0	7.0	672.0	1222.0	2482.0	2778.0	강북구	37.637197	127.027305
강서	239.0	275.0	10.0	10.0	10.0	9.0	1070.0	1952.0	2768.0	3204.0	양천구	37.539783	126.829997
관악	264.0	322.0	10.0	12.0	7.0	6.0	937.0	2103.0	2707.0	3235.0	관악구	37.474395	126.951349

In [278]:

crime_station.columns.get_level_values(0) + crime_station.columns.get_level_values(1)

Out[278]:

Index(['강간검거', '강간발생', '강도검거', '강도발생', '살인검거', '살인발생', '절도검거', '절도발생', '폭력검거',
       '폭력발생', '구별', 'lat', 'lng'],
      dtype='object')

In [279]:

crime_station.columns.get_level_values(0)[12]

Out[279]:

'lng'

In [280]:

tmp = [crime_station.columns.get_level_values(0)[n] + crime_station.columns.get_level_values(1)[n]
    for n in range(0, len(crime_station.columns.get_level_values(0)))
]
tmp

Out[280]:

['강간검거',
 '강간발생',
 '강도검거',
 '강도발생',
 '살인검거',
 '살인발생',
 '절도검거',
 '절도발생',
 '폭력검거',
 '폭력발생',
 '구별',
 'lat',
 'lng']

In [281]:

crime_station.columns = tmp

In [282]:

crime_station.head()

Out[282]:

	강간검거	강간발생	강도검거	강도발생	살인검거	살인발생	절도검거	절도발생	폭력검거	폭력발생	구별	lat	lng
구분
강남	269.0	339.0	26.0	24.0	3.0	3.0	1129.0	2438.0	2096.0	2336.0	강남구	37.509435	127.066958
강동	152.0	160.0	13.0	14.0	5.0	4.0	902.0	1754.0	2201.0	2530.0	강동구	37.528511	127.126822
강북	159.0	217.0	4.0	5.0	6.0	7.0	672.0	1222.0	2482.0	2778.0	강북구	37.637197	127.027305
강서	239.0	275.0	10.0	10.0	10.0	9.0	1070.0	1952.0	2768.0	3204.0	양천구	37.539783	126.829997
관악	264.0	322.0	10.0	12.0	7.0	6.0	937.0	2103.0	2707.0	3235.0	관악구	37.474395	126.951349

In [283]:

#데이터 저장
crime_station.to_csv("../data/02. crime_in_Seoul_raw.csv", sep=",", encoding = "utf-8")

In [284]:

pd.read_csv("../data/02. crime_in_Seoul_raw.csv").head(2)

Out[284]:

	구분	강간검거	강간발생	강도검거	강도발생	살인검거	살인발생	절도검거	절도발생	폭력검거	폭력발생	구별	lat	lng
0	강남	269.0	339.0	26.0	24.0	3.0	3.0	1129.0	2438.0	2096.0	2336.0	강남구	37.509435	127.066958
1	강동	152.0	160.0	13.0	14.0	5.0	4.0	902.0	1754.0	2201.0	2530.0	강동구	37.528511	127.126822

7. 구별 데이터로 정리¶

In [285]:

crime_anal_station = pd.read_csv("../data/02. crime_in_Seoul_raw.csv", index_col = 0, encoding = "utf-8")
# index_col "구분"을 인덱스 칼럼으로 설정
crime_anal_station.head()

Out[285]:

	강간검거	강간발생	강도검거	강도발생	살인검거	살인발생	절도검거	절도발생	폭력검거	폭력발생	구별	lat	lng
구분
강남	269.0	339.0	26.0	24.0	3.0	3.0	1129.0	2438.0	2096.0	2336.0	강남구	37.509435	127.066958
강동	152.0	160.0	13.0	14.0	5.0	4.0	902.0	1754.0	2201.0	2530.0	강동구	37.528511	127.126822
강북	159.0	217.0	4.0	5.0	6.0	7.0	672.0	1222.0	2482.0	2778.0	강북구	37.637197	127.027305
강서	239.0	275.0	10.0	10.0	10.0	9.0	1070.0	1952.0	2768.0	3204.0	양천구	37.539783	126.829997
관악	264.0	322.0	10.0	12.0	7.0	6.0	937.0	2103.0	2707.0	3235.0	관악구	37.474395	126.951349

In [286]:

crime_anal_gu = pd.pivot_table(crime_anal_station, index="구별", aggfunc = np.sum)
del crime_anal_gu["lat"]
crime_anal_gu.drop("lng", axis = 1, inplace = True)

crime_anal_gu.head()

Out[286]:

	강간검거	강간발생	강도검거	강도발생	살인검거	살인발생	절도검거	절도발생	폭력검거	폭력발생
구별
강남구	413.0	516.0	42.0	39.0	5.0	5.0	1918.0	3587.0	3527.0	4002.0
강동구	152.0	160.0	13.0	14.0	5.0	4.0	902.0	1754.0	2201.0	2530.0
강북구	159.0	217.0	4.0	5.0	6.0	7.0	672.0	1222.0	2482.0	2778.0
관악구	264.0	322.0	10.0	12.0	7.0	6.0	937.0	2103.0	2707.0	3235.0
광진구	234.0	279.0	6.0	11.0	4.0	4.0	1057.0	2636.0	2011.0	2392.0

In [287]:

#다수의 컬럼을 다른 컬럼으로 나누기

crime_anal_gu[["강도검거", "살인검거"]].div(crime_anal_gu["강도발생"], axis=0).head()

Out[287]:

	강도검거	살인검거
구별
강남구	1.076923	0.128205
강동구	0.928571	0.357143
강북구	0.800000	1.200000
관악구	0.833333	0.583333
광진구	0.545455	0.363636

In [288]:

#다수의 컬럼을 다수의 컬럼으로 각각 나누기

num = ["강간검거", "강도검거", "살인검거", "절도검거", "폭력검거"]
den = ["강간발생", "강도발생", "살인발생", "절도발생", "폭력발생"]

crime_anal_gu[num].div(crime_anal_gu[den].values).head()

Out[288]:

	강간검거	강도검거	살인검거	절도검거	폭력검거
구별
강남구	0.800388	1.076923	1.000000	0.534709	0.881309
강동구	0.950000	0.928571	1.250000	0.514253	0.869960
강북구	0.732719	0.800000	0.857143	0.549918	0.893449
관악구	0.819876	0.833333	1.166667	0.445554	0.836785
광진구	0.838710	0.545455	1.000000	0.400986	0.840719

In [289]:

target = ["강간검거율", "강도검거율", "살인검거율", "절도검거율", "폭력검거율"]
num = ["강간검거", "강도검거", "살인검거", "절도검거", "폭력검거"]
den = ["강간발생", "강도발생", "살인발생", "절도발생", "폭력발생"]

crime_anal_gu[target] = crime_anal_gu[num].div(crime_anal_gu[den].values) * 100
crime_anal_gu.head()

Out[289]:

	강간검거	강간발생	강도검거	강도발생	살인검거	살인발생	절도검거	절도발생	폭력검거	폭력발생	강간검거율	강도검거율	살인검거율	절도검거율	폭력검거율
구별
강남구	413.0	516.0	42.0	39.0	5.0	5.0	1918.0	3587.0	3527.0	4002.0	80.038760	107.692308	100.000000	53.470867	88.130935
강동구	152.0	160.0	13.0	14.0	5.0	4.0	902.0	1754.0	2201.0	2530.0	95.000000	92.857143	125.000000	51.425314	86.996047
강북구	159.0	217.0	4.0	5.0	6.0	7.0	672.0	1222.0	2482.0	2778.0	73.271889	80.000000	85.714286	54.991817	89.344852
관악구	264.0	322.0	10.0	12.0	7.0	6.0	937.0	2103.0	2707.0	3235.0	81.987578	83.333333	116.666667	44.555397	83.678516
광진구	234.0	279.0	6.0	11.0	4.0	4.0	1057.0	2636.0	2011.0	2392.0	83.870968	54.545455	100.000000	40.098634	84.071906

In [290]:

del crime_anal_gu["강간검거"]
del crime_anal_gu["강도검거"]
del crime_anal_gu["살인검거"]
del crime_anal_gu["절도검거"]
del crime_anal_gu["폭력검거"]
crime_anal_gu.head()

Out[290]:

	강간발생	강도발생	살인발생	절도발생	폭력발생	강간검거율	강도검거율	살인검거율	절도검거율	폭력검거율
구별
강남구	516.0	39.0	5.0	3587.0	4002.0	80.038760	107.692308	100.000000	53.470867	88.130935
강동구	160.0	14.0	4.0	1754.0	2530.0	95.000000	92.857143	125.000000	51.425314	86.996047
강북구	217.0	5.0	7.0	1222.0	2778.0	73.271889	80.000000	85.714286	54.991817	89.344852
관악구	322.0	12.0	6.0	2103.0	3235.0	81.987578	83.333333	116.666667	44.555397	83.678516
광진구	279.0	11.0	4.0	2636.0	2392.0	83.870968	54.545455	100.000000	40.098634	84.071906

In [291]:

# 100보다 큰 숫자 찾아서 바꾸기

crime_anal_gu[crime_anal_gu[target] > 100] = 100
crime_anal_gu.head()

Out[291]:

	강간발생	강도발생	살인발생	절도발생	폭력발생	강간검거율	강도검거율	살인검거율	절도검거율	폭력검거율
구별
강남구	516.0	39.0	5.0	3587.0	4002.0	80.038760	100.000000	100.000000	53.470867	88.130935
강동구	160.0	14.0	4.0	1754.0	2530.0	95.000000	92.857143	100.000000	51.425314	86.996047
강북구	217.0	5.0	7.0	1222.0	2778.0	73.271889	80.000000	85.714286	54.991817	89.344852
관악구	322.0	12.0	6.0	2103.0	3235.0	81.987578	83.333333	100.000000	44.555397	83.678516
광진구	279.0	11.0	4.0	2636.0	2392.0	83.870968	54.545455	100.000000	40.098634	84.071906

In [292]:

crime_anal_gu.rename(columns =
                     {"강간발생":"강간",
                     "살인발생":"살인",
                     "절도발생":"절도",
                     "폭력발생":"폭력",
                     "강도발생":"강도"},
                    inplace = True)
crime_anal_gu.head()

Out[292]:

	강간	강도	살인	절도	폭력	강간검거율	강도검거율	살인검거율	절도검거율	폭력검거율
구별
강남구	516.0	39.0	5.0	3587.0	4002.0	80.038760	100.000000	100.000000	53.470867	88.130935
강동구	160.0	14.0	4.0	1754.0	2530.0	95.000000	92.857143	100.000000	51.425314	86.996047
강북구	217.0	5.0	7.0	1222.0	2778.0	73.271889	80.000000	85.714286	54.991817	89.344852
관악구	322.0	12.0	6.0	2103.0	3235.0	81.987578	83.333333	100.000000	44.555397	83.678516
광진구	279.0	11.0	4.0	2636.0	2392.0	83.870968	54.545455	100.000000	40.098634	84.071906

8. 범죄 데이터 정렬을 위한 데이터 정리¶

In [293]:

crime_anal_gu.head()

Out[293]:

	강간	강도	살인	절도	폭력	강간검거율	강도검거율	살인검거율	절도검거율	폭력검거율
구별
강남구	516.0	39.0	5.0	3587.0	4002.0	80.038760	100.000000	100.000000	53.470867	88.130935
강동구	160.0	14.0	4.0	1754.0	2530.0	95.000000	92.857143	100.000000	51.425314	86.996047
강북구	217.0	5.0	7.0	1222.0	2778.0	73.271889	80.000000	85.714286	54.991817	89.344852
관악구	322.0	12.0	6.0	2103.0	3235.0	81.987578	83.333333	100.000000	44.555397	83.678516
광진구	279.0	11.0	4.0	2636.0	2392.0	83.870968	54.545455	100.000000	40.098634	84.071906

In [294]:

# 정규화: 최고값은 1, 최소값은 0
crime_anal_gu["강도"] / crime_anal_gu["강도"].max()

Out[294]:

구별
강남구     1.000000
강동구     0.358974
강북구     0.128205
관악구     0.307692
광진구     0.282051
구로구     0.256410
금천구     0.179487
노원구     0.153846
도봉구     0.128205
동대문구    0.256410
동작구     0.179487
마포구     0.102564
서대문구    0.128205
서초구     0.333333
성동구     0.076923
성북구     0.205128
송파구     0.384615
양천구     0.435897
영등포구    0.487179
용산구     0.230769
은평구     0.230769
종로구     0.307692
중구      0.205128
중랑구     0.358974
Name: 강도, dtype: float64

In [295]:

col = ["살인", "강도", "강간", "절도", "폭력"]
crime_anal_norm = crime_anal_gu[col] / crime_anal_gu[col].max()
crime_anal_norm.head()

Out[295]:

	살인	강도	강간	절도	폭력
구별
강남구	0.357143	1.000000	1.000000	0.977118	0.733773
강동구	0.285714	0.358974	0.310078	0.477799	0.463880
강북구	0.500000	0.128205	0.420543	0.332879	0.509351
관악구	0.428571	0.307692	0.624031	0.572868	0.593143
광진구	0.285714	0.282051	0.540698	0.718060	0.438577

In [296]:

crime_anal_gu.head(1)

Out[296]:

	강간	강도	살인	절도	폭력	강간검거율	강도검거율	살인검거율	절도검거율	폭력검거율
구별
강남구	516.0	39.0	5.0	3587.0	4002.0	80.03876	100.0	100.0	53.470867	88.130935

In [297]:

# 검거율 추가
col2 = ["강간검거율", "강도검거율","살인검거율","절도검거율","폭력검거율"]
crime_anal_norm[col2] = crime_anal_gu[col2]
crime_anal_norm.head()

Out[297]:

	살인	강도	강간	절도	폭력	강간검거율	강도검거율	살인검거율	절도검거율	폭력검거율
구별
강남구	0.357143	1.000000	1.000000	0.977118	0.733773	80.038760	100.000000	100.000000	53.470867	88.130935
강동구	0.285714	0.358974	0.310078	0.477799	0.463880	95.000000	92.857143	100.000000	51.425314	86.996047
강북구	0.500000	0.128205	0.420543	0.332879	0.509351	73.271889	80.000000	85.714286	54.991817	89.344852
관악구	0.428571	0.307692	0.624031	0.572868	0.593143	81.987578	83.333333	100.000000	44.555397	83.678516
광진구	0.285714	0.282051	0.540698	0.718060	0.438577	83.870968	54.545455	100.000000	40.098634	84.071906

In [298]:

# 구별 cctv 자료에서 인구수, cctv수 추가
result_CCTV = pd.read_csv("../data/01. CCTV_result.csv",index_col = "구별", encoding = "utf-8")
result_CCTV.head()

Out[298]:

	소계	최근증가율	인구수	한국인	외국인	고령자	외국인비율	고령자비율	CCTV비율	오차
구별
강남구	3238	150.619195	561052	556164	4888	65060	0.871220	11.596073	0.577130	1549.200326
강동구	1010	166.490765	440359	436223	4136	56161	0.939234	12.753458	0.229358	-544.642322
강북구	831	125.203252	328002	324479	3523	56530	1.074079	17.234651	0.253352	-598.750923
강서구	911	134.793814	608255	601691	6564	76032	1.079153	12.500021	0.149773	-830.268578
관악구	2109	149.290780	520929	503297	17632	70046	3.384722	13.446362	0.404854	464.799395

In [299]:

crime_anal_norm[["인구수", "CCTV"]] = result_CCTV[["인구수", "소계"]]
crime_anal_norm.head()

Out[299]:

	살인	강도	강간	절도	폭력	강간검거율	강도검거율	살인검거율	절도검거율	폭력검거율	인구수	CCTV
구별
강남구	0.357143	1.000000	1.000000	0.977118	0.733773	80.038760	100.000000	100.000000	53.470867	88.130935	561052	3238
강동구	0.285714	0.358974	0.310078	0.477799	0.463880	95.000000	92.857143	100.000000	51.425314	86.996047	440359	1010
강북구	0.500000	0.128205	0.420543	0.332879	0.509351	73.271889	80.000000	85.714286	54.991817	89.344852	328002	831
관악구	0.428571	0.307692	0.624031	0.572868	0.593143	81.987578	83.333333	100.000000	44.555397	83.678516	520929	2109
광진구	0.285714	0.282051	0.540698	0.718060	0.438577	83.870968	54.545455	100.000000	40.098634	84.071906	372298	878

In [300]:

# 정규화된 범죄발생 건수 전체의 평균을 구해서 범죄 컬럼 대표값으로 사용

col = ["강간", "강도", "살인", "절도", "폭력"]
crime_anal_norm["범죄"] = np.mean(crime_anal_norm[col], axis = 1)
crime_anal_norm.head()

Out[300]:

	살인	강도	강간	절도	폭력	강간검거율	강도검거율	살인검거율	절도검거율	폭력검거율	인구수	CCTV	범죄
구별
강남구	0.357143	1.000000	1.000000	0.977118	0.733773	80.038760	100.000000	100.000000	53.470867	88.130935	561052	3238	0.813607
강동구	0.285714	0.358974	0.310078	0.477799	0.463880	95.000000	92.857143	100.000000	51.425314	86.996047	440359	1010	0.379289
강북구	0.500000	0.128205	0.420543	0.332879	0.509351	73.271889	80.000000	85.714286	54.991817	89.344852	328002	831	0.378196
관악구	0.428571	0.307692	0.624031	0.572868	0.593143	81.987578	83.333333	100.000000	44.555397	83.678516	520929	2109	0.505261
광진구	0.285714	0.282051	0.540698	0.718060	0.438577	83.870968	54.545455	100.000000	40.098634	84.071906	372298	878	0.453020

np.mean()¶

In [301]:

np.array([0.357143, 1.000000, 1.000000, 0.977118, 0.733773])
np.mean(np.array([0.357143, 1.000000, 1.000000, 0.977118, 0.733773]))

Out[301]:

0.8136068

In [302]:

np.mean(np.array(
    [[0.357143, 1.000000, 1.000000, 0.977118, 0.733773],
    [0.285714, 0.358974, 0.310078, 0.477799, 0.463880]]),
    axis = 1
       )  # axis = 1: 행 값 / axis = 0: 열 값 (cf. pandas는 반대)

Out[302]:

array([0.8136068, 0.379289 ])

In [303]:

# 검거율의 평균을 구해서 검거 컬럼의 대표값으로 사용

col = ["강간검거율", "강도검거율", "살인검거율", "절도검거율", "폭력검거율"]
crime_anal_norm["검거"] = np.mean(crime_anal_norm[col], axis = 1)
crime_anal_norm.head()

Out[303]:

	살인	강도	강간	절도	폭력	강간검거율	강도검거율	살인검거율	절도검거율	폭력검거율	인구수	CCTV	범죄	검거
구별
강남구	0.357143	1.000000	1.000000	0.977118	0.733773	80.038760	100.000000	100.000000	53.470867	88.130935	561052	3238	0.813607	84.328112
강동구	0.285714	0.358974	0.310078	0.477799	0.463880	95.000000	92.857143	100.000000	51.425314	86.996047	440359	1010	0.379289	85.255701
강북구	0.500000	0.128205	0.420543	0.332879	0.509351	73.271889	80.000000	85.714286	54.991817	89.344852	328002	831	0.378196	76.664569
관악구	0.428571	0.307692	0.624031	0.572868	0.593143	81.987578	83.333333	100.000000	44.555397	83.678516	520929	2109	0.505261	78.710965
광진구	0.285714	0.282051	0.540698	0.718060	0.438577	83.870968	54.545455	100.000000	40.098634	84.071906	372298	878	0.453020	72.517393

Seaborn()¶

In [304]:

conda install -y seaborn

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.


Note: you may need to restart the kernel to use updated packages.

In [305]:

import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib import rc

plt.rcParams["axes.unicode_minus"] = False
rc("font", family = "Malgun Gothic")
%matplotlib inline
get_ipython().run_line_magic("matplotlib","inline")

예제1: seaborn 기초¶

In [306]:

np.linspace(0, 14, 100)

Out[306]:

array([ 0.        ,  0.14141414,  0.28282828,  0.42424242,  0.56565657,
        0.70707071,  0.84848485,  0.98989899,  1.13131313,  1.27272727,
        1.41414141,  1.55555556,  1.6969697 ,  1.83838384,  1.97979798,
        2.12121212,  2.26262626,  2.4040404 ,  2.54545455,  2.68686869,
        2.82828283,  2.96969697,  3.11111111,  3.25252525,  3.39393939,
        3.53535354,  3.67676768,  3.81818182,  3.95959596,  4.1010101 ,
        4.24242424,  4.38383838,  4.52525253,  4.66666667,  4.80808081,
        4.94949495,  5.09090909,  5.23232323,  5.37373737,  5.51515152,
        5.65656566,  5.7979798 ,  5.93939394,  6.08080808,  6.22222222,
        6.36363636,  6.50505051,  6.64646465,  6.78787879,  6.92929293,
        7.07070707,  7.21212121,  7.35353535,  7.49494949,  7.63636364,
        7.77777778,  7.91919192,  8.06060606,  8.2020202 ,  8.34343434,
        8.48484848,  8.62626263,  8.76767677,  8.90909091,  9.05050505,
        9.19191919,  9.33333333,  9.47474747,  9.61616162,  9.75757576,
        9.8989899 , 10.04040404, 10.18181818, 10.32323232, 10.46464646,
       10.60606061, 10.74747475, 10.88888889, 11.03030303, 11.17171717,
       11.31313131, 11.45454545, 11.5959596 , 11.73737374, 11.87878788,
       12.02020202, 12.16161616, 12.3030303 , 12.44444444, 12.58585859,
       12.72727273, 12.86868687, 13.01010101, 13.15151515, 13.29292929,
       13.43434343, 13.57575758, 13.71717172, 13.85858586, 14.        ])

In [307]:

x = np.linspace(0, 14, 100)
y1 = np.sin(x)
y2 = 2 * np.sin(x + 0.5)
y3 = 3 * np.sin(x + 1)
y4 = 4 * np.sin(x + 1.5)

In [308]:

plt.figure(figsize =(10, 6))
plt.plot(x, y1, x, y2, x, y4)
plt.show()

In [309]:

# sns.set_style()
#white, whitegrid, dark, darkgrid,ticks
sns.set_style("white")
plt.figure(figsize =(10, 6))
plt.plot(x, y1, x, y2, x, y4)
plt.show()

예제2: seaborn tips data¶

boxplot
swarmplot
lmplot

In [310]:

tips = sns.load_dataset("tips")
tips

Out[310]:

	total_bill	tip	sex	smoker	day	time	size
0	16.99	1.01	Female	No	Sun	Dinner	2
1	10.34	1.66	Male	No	Sun	Dinner	3
2	21.01	3.50	Male	No	Sun	Dinner	3
3	23.68	3.31	Male	No	Sun	Dinner	2
4	24.59	3.61	Female	No	Sun	Dinner	4
...	...	...	...	...	...	...	...
239	29.03	5.92	Male	No	Sat	Dinner	3
240	27.18	2.00	Female	Yes	Sat	Dinner	2
241	22.67	2.00	Male	Yes	Sat	Dinner	2
242	17.82	1.75	Male	No	Sat	Dinner	2
243	18.78	3.00	Female	No	Thur	Dinner	2

244 rows × 7 columns

In [311]:

tips.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244 entries, 0 to 243
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   total_bill  244 non-null    float64 
 1   tip         244 non-null    float64 
 2   sex         244 non-null    category
 3   smoker      244 non-null    category
 4   day         244 non-null    category
 5   time        244 non-null    category
 6   size        244 non-null    int64   
dtypes: category(4), float64(2), int64(1)
memory usage: 7.4 KB

In [312]:

#boxplot
plt.figure(figsize = (8, 6))
sns.boxplot(x=tips["total_bill"])
plt.show()

In [313]:

#boxplot
plt.figure(figsize = (8, 6))
sns.boxplot(x=tips["day"],y=tips["total_bill"], data=tips)
plt.show()

In [314]:

# boxplot hue, palette option

plt.figure(figsize = (8, 6))
sns.boxplot(x = "day", y = "total_bill", data = tips, hue ="smoker", palette = "Set3")

Out[314]:

<AxesSubplot: xlabel='day', ylabel='total_bill'>

In [315]:

#swarmplot
# color: 0-1 사이 검은색부터 흰색 값을 조절

plt.figure(figsize = (8, 6))
sns.swarmplot(x="day", y="total_bill", data = tips, color = "0.7")

Out[315]:

<AxesSubplot: xlabel='day', ylabel='total_bill'>

In [316]:

# boxplot with swarmplot

plt.figure(figsize=(8, 6))
sns.boxplot(x="day", y = "total_bill", data=tips)
sns.swarmplot(x="day", y="total_bill", data=tips, color = "0.25" )
plt.show()

In [317]:

# lmplot: total_bill과 tip 사이 관계 파악
sns.set_style("darkgrid")
sns.lmplot(x="total_bill", y="tip", data=tips, height = 7, hue="smoker");

예제3: flights data¶

heatmap

In [318]:

flights = sns.load_dataset("flights")
flights.head()

Out[318]:

	year	month	passengers
0	1949	Jan	112
1	1949	Feb	118
2	1949	Mar	132
3	1949	Apr	129
4	1949	May	121

In [319]:

flights.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 144 entries, 0 to 143
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   year        144 non-null    int64   
 1   month       144 non-null    category
 2   passengers  144 non-null    int64   
dtypes: category(1), int64(2)
memory usage: 2.9 KB

In [320]:

# pivot
# index, columns, values

flights = flights.pivot(index="month", columns = "year", values = "passengers")
flights.head()

Out[320]:

year	1949	1950	1951	1952	1953	1954	1955	1956	1957	1958	1959	1960
month
Jan	112	115	145	171	196	204	242	284	315	340	360	417
Feb	118	126	150	180	196	188	233	277	301	318	342	391
Mar	132	141	178	193	236	235	267	317	356	362	406	419
Apr	129	135	163	181	235	227	269	313	348	348	396	461
May	121	125	172	183	229	234	270	318	355	363	420	472

In [321]:

#headmap

plt.figure(figsize = (10, 8))
sns.heatmap(data=flights, annot = True, fmt = "d")
# aanot: 데이터 값 표시 / #fmt = format / fmt = "d" - 정수형 표현
plt.show()

In [322]:

#colormap

plt.figure(figsize = (10, 8))
sns.heatmap(flights, annot=True, fmt="d", cmap = "YlGnBu")
plt.show()

예제4: iris data¶

pairplot

In [323]:

iris = sns.load_dataset("iris")
iris.tail()

Out[323]:

	sepal_length	sepal_width	petal_length	petal_width	species
145	6.7	3.0	5.2	2.3	virginica
146	6.3	2.5	5.0	1.9	virginica
147	6.5	3.0	5.2	2.0	virginica
148	6.2	3.4	5.4	2.3	virginica
149	5.9	3.0	5.1	1.8	virginica

In [324]:

# pairplot

sns.set_style("ticks")
sns.pairplot(iris)
plt.show()

In [325]:

iris.head(2)

Out[325]:

	sepal_length	sepal_width	petal_length	petal_width	species
0	5.1	3.5	1.4	0.2	setosa
1	4.9	3.0	1.4	0.2	setosa

In [326]:

iris["species"].unique()

Out[326]:

array(['setosa', 'versicolor', 'virginica'], dtype=object)

In [327]:

#hue option

sns.pairplot(iris, hue="species")

Out[327]:

<seaborn.axisgrid.PairGrid at 0x18f5bc92790>

In [328]:

# 원하는 컬럼만 pairplot
sns.pairplot(iris,
             x_vars=["sepal_width", "sepal_length"],
             y_vars = ["petal_width","petal_length"])
plt.show()

예제5: anscombe data¶

implot

In [329]:

anscombe = sns.load_dataset("anscombe")
anscombe.head()

Out[329]:

	dataset	x	y
0	I	10.0	8.04
1	I	8.0	6.95
2	I	13.0	7.58
3	I	9.0	8.81
4	I	11.0	8.33

In [330]:

anscombe["dataset"].unique()

Out[330]:

array(['I', 'II', 'III', 'IV'], dtype=object)

In [331]:

sns.set_style("darkgrid")
sns.lmplot(x="x", y="y", data=anscombe.query("dataset == 'I'"), ci=None, height = 7) #ci : 신뢰구간 선택
plt.show()

In [332]:

sns.set_style("darkgrid")
sns.lmplot(
    x="x",
    y="y",
    data=anscombe.query("dataset == 'I'"),
    ci=None, #ci : 신뢰구간 선택
    height = 7,
    scatter_kws={"s":30})
plt.show()

In [333]:

#order option

sns.set_style("darkgrid")
sns.lmplot(
    x="x",
    y="y",
    data=anscombe.query("dataset == 'II'"),
    order = 2,  #2차식을 직선으로
    ci=None, #ci : 신뢰구간 선택
    height = 7,
    scatter_kws={"s":30})
plt.show()

In [334]:

!pip install statsmodels

Requirement already satisfied: statsmodels in c:\users\admin\miniconda3\envs\ds_study\lib\site-packages (0.13.5)
Requirement already satisfied: patsy>=0.5.2 in c:\users\admin\miniconda3\envs\ds_study\lib\site-packages (from statsmodels) (0.5.3)
Requirement already satisfied: scipy>=1.3 in c:\users\admin\miniconda3\envs\ds_study\lib\site-packages (from statsmodels) (1.10.1)
Requirement already satisfied: packaging>=21.3 in c:\users\admin\miniconda3\envs\ds_study\lib\site-packages (from statsmodels) (22.0)
Requirement already satisfied: pandas>=0.25 in c:\users\admin\miniconda3\envs\ds_study\lib\site-packages (from statsmodels) (1.5.2)
Requirement already satisfied: numpy>=1.17 in c:\users\admin\miniconda3\envs\ds_study\lib\site-packages (from statsmodels) (1.23.5)
Requirement already satisfied: pytz>=2020.1 in c:\users\admin\miniconda3\envs\ds_study\lib\site-packages (from pandas>=0.25->statsmodels) (2022.7)
Requirement already satisfied: python-dateutil>=2.8.1 in c:\users\admin\miniconda3\envs\ds_study\lib\site-packages (from pandas>=0.25->statsmodels) (2.8.2)
Requirement already satisfied: six in c:\users\admin\miniconda3\envs\ds_study\lib\site-packages (from patsy>=0.5.2->statsmodels) (1.16.0)

In [335]:

#outlier

sns.set_style("darkgrid")
sns.lmplot(
    x="x",
    y="y",
    data=anscombe.query("dataset == 'III'"),
    robust=True,
    ci=None, #ci : 신뢰구간 선택
    height = 7,
    scatter_kws={"s":30})
plt.show()

9. 서울시 범죄현황 데이터 시각화¶

In [336]:

import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib import rc

plt.rcParams["axes.unicode_minus"] = False
rc("font", family = "Malgun Gothic")
%matplotlib inline

In [337]:

crime_anal_norm.head()

Out[337]:

	살인	강도	강간	절도	폭력	강간검거율	강도검거율	살인검거율	절도검거율	폭력검거율	인구수	CCTV	범죄	검거
구별
강남구	0.357143	1.000000	1.000000	0.977118	0.733773	80.038760	100.000000	100.000000	53.470867	88.130935	561052	3238	0.813607	84.328112
강동구	0.285714	0.358974	0.310078	0.477799	0.463880	95.000000	92.857143	100.000000	51.425314	86.996047	440359	1010	0.379289	85.255701
강북구	0.500000	0.128205	0.420543	0.332879	0.509351	73.271889	80.000000	85.714286	54.991817	89.344852	328002	831	0.378196	76.664569
관악구	0.428571	0.307692	0.624031	0.572868	0.593143	81.987578	83.333333	100.000000	44.555397	83.678516	520929	2109	0.505261	78.710965
광진구	0.285714	0.282051	0.540698	0.718060	0.438577	83.870968	54.545455	100.000000	40.098634	84.071906	372298	878	0.453020	72.517393

In [1]:

# pairplot 강도, 살인, 폭력에 대한 상관관계 확인

sns.pairplot(data=crime_anal_norm, vars=["살인", "강도", "폭력"], kind ="reg", height = 3);
#kind : scatter, kde, hist, reg

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[1], line 3
      1 # pairplot 강도, 살인, 폭력에 대한 상관관계 확인
----> 3 sns.pairplot(data=crime_anal_norm, vars=["살인", "강도", "폭력"], kind ="scatter", height = 3)

NameError: name 'sns' is not defined

In [339]:

crime_anal_norm.head(1)

Out[339]:

	살인	강도	강간	절도	폭력	강간검거율	강도검거율	살인검거율	절도검거율	폭력검거율	인구수	CCTV	범죄	검거
구별
강남구	0.357143	1.0	1.0	0.977118	0.733773	80.03876	100.0	100.0	53.470867	88.130935	561052	3238	0.813607	84.328112

In [340]:

# "인구수", "CCTV" 과 "살인", "강도"의 상관관계 확인

def drawGraph():
    sns.pairplot(
        data=crime_anal_norm,
        x_vars = ["인구수", "CCTV"],
        y_vars = ["살인", "강도"],
        kind = "reg",
        height = 4)
    plt.show()
    
drawGraph()

In [341]:

# "인구수", "CCTV"와 "살인검거율", "폭력검거율"의 상관관계 확인

def drawGraph():
    sns.pairplot(
        data=crime_anal_norm,
        x_vars = ["인구수", "CCTV"],
        y_vars = ["살인검거율", "폭력검거율"],
        kind = "reg",
        height = 4)
    plt.show()
    
drawGraph()

In [342]:

# "인구수", "CCTV"와 "절도검거율", "강도검거율"의 상관관계 확인

def drawGraph():
    sns.pairplot(
        data=crime_anal_norm,
        x_vars = ["인구수", "CCTV"],
        y_vars = ["절도검거율", "강도검거율"],
        kind = "reg",
        height = 4)
    plt.show()
    
drawGraph()

In [343]:

crime_anal_norm.head(3)

Out[343]:

	살인	강도	강간	절도	폭력	강간검거율	강도검거율	살인검거율	절도검거율	폭력검거율	인구수	CCTV	범죄	검거
구별
강남구	0.357143	1.000000	1.000000	0.977118	0.733773	80.038760	100.000000	100.000000	53.470867	88.130935	561052	3238	0.813607	84.328112
강동구	0.285714	0.358974	0.310078	0.477799	0.463880	95.000000	92.857143	100.000000	51.425314	86.996047	440359	1010	0.379289	85.255701
강북구	0.500000	0.128205	0.420543	0.332879	0.509351	73.271889	80.000000	85.714286	54.991817	89.344852	328002	831	0.378196	76.664569

In [344]:

# 검거율 heatmap
# "검거" 칼럼을 기준으로 정렬

def drawGraph():
    
    # 데이터 프레임 생성
    target_col = ["강간검거율", "강도검거율", "살인검거율", "절도검거율", "폭력검거율", "검거"]
    crime_anal_norm_sort = crime_anal_norm.sort_values(by="검거", ascending=False)  #내림차순
    
    # 그래프 설정
    plt.figure(figsize = (10, 10))
    sns.heatmap(
        data = crime_anal_norm_sort[target_col],
        annot = True, #데이터값 표현
        fmt = "f",
        linewidths = 1, #간격설정
        cmap = "RdPu"
        )
    plt.title("범죄 검거 비율(정규화된 검거의 합으로 정렬)")
    plt.show()
    
drawGraph()

In [345]:

# 범죄발생 건수 heatmap
# "범죄" 컬럼을 기준으로 정렬

def drawGraph():
    
    #데이터 프레임 생성
    target_col = ["살인", "강도", "강간", "절도", "폭력", "범죄"]
    crime_anal_norm_sort = crime_anal_norm.sort_values(by="범죄", ascending = False)
    
    #그래프 설정
    plt.figure(figsize=(10,10))
    sns.heatmap(
        data=crime_anal_norm_sort[target_col],
        annot=True,
        fmt = "f",
        linewidths=0.5,
        cmap="RdPu",
    )
    plt.title("범죄 비율(정규화된 발생 건수로 정렬)")
    plt.show()
    
drawGraph()

In [346]:

# 데이터 저장

crime_anal_norm.to_csv("../data/02. crime_in_Seoul_final.csv", sep=",", encoding = "utf-8")

folium¶

In [347]:

!pip install folium

Requirement already satisfied: folium in c:\users\admin\miniconda3\envs\ds_study\lib\site-packages (0.14.0)
Requirement already satisfied: numpy in c:\users\admin\miniconda3\envs\ds_study\lib\site-packages (from folium) (1.23.5)
Requirement already satisfied: requests in c:\users\admin\miniconda3\envs\ds_study\lib\site-packages (from folium) (2.28.1)
Requirement already satisfied: branca>=0.6.0 in c:\users\admin\miniconda3\envs\ds_study\lib\site-packages (from folium) (0.6.0)
Requirement already satisfied: jinja2>=2.9 in c:\users\admin\miniconda3\envs\ds_study\lib\site-packages (from folium) (3.1.2)
Requirement already satisfied: MarkupSafe>=2.0 in c:\users\admin\miniconda3\envs\ds_study\lib\site-packages (from jinja2>=2.9->folium) (2.1.1)
Requirement already satisfied: charset-normalizer<3,>=2 in c:\users\admin\miniconda3\envs\ds_study\lib\site-packages (from requests->folium) (2.0.4)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\admin\miniconda3\envs\ds_study\lib\site-packages (from requests->folium) (2022.12.7)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\users\admin\miniconda3\envs\ds_study\lib\site-packages (from requests->folium) (1.26.14)
Requirement already satisfied: idna<4,>=2.5 in c:\users\admin\miniconda3\envs\ds_study\lib\site-packages (from requests->folium) (3.4)

In [348]:

import folium
import pandas as pd
import json

folium.map()¶

location: tuple or list, defalt None
    Latitude and Longitude of Map(Northing, Easting).

In [349]:

m = folium.Map(location=[37.544564958079896, 127.05582307754338], zoom_start = 14) #0~18
m

Out[349]:

Make this Notebook Trusted to load map: File -> Trust Notebook

save(path)¶

In [350]:

m.save("./folium.html")

tiles option¶

- "OpenStreetMap"
- "Mapbox Bright" (Limited levels of zoom for free tiles)
- "Mapbox Control Room" (Limited levels of zoom for free tiles)
- "Stamen" (Terrain, Toner, and Watercolor)
- "Cloudmade" (Must pass API key)
- "Mapbox" (Must pass API key)
- "CartoDB" (positron and dark_matter)

In [351]:

m = folium.Map(
    location=[37.544564958079896, 127.05582307754338],
    zoom_start = 14,
    tiles="OpenStreetMap")

m

Out[351]:

Make this Notebook Trusted to load map: File -> Trust Notebook

folium.Marker()¶

지도에 마커 생성

In [352]:

m = folium.Map(
    location=[37.544564958079896, 127.05582307754338], #성수역
    zoom_start = 14,
    tiles="OpenStreetMap")

#뚝섬역
folium.Marker((37.54712311308356, 127.04721916917774)).add_to(m)

#성수역
folium.Marker([37.544564958079896, 127.05582307754338],
            popup = "<b>Subway</b>",
            tooltip = "<i>성수역</i>"
             ).add_to(m)


# Zerobase
folium.Marker([37.54558642069953,127.05729705810472],
              popup = "<a href='https://zero-base.co.kr/' target=_'blink'>제로베이스</a>",
              tooltip = "<i>Zerobase</i>"
             ).add_to(m)

m

Out[352]:

Make this Notebook Trusted to load map: File -> Trust Notebook

folium.icon()¶

In [353]:

m = folium.Map(
    location=[37.544564958079896, 127.05582307754338], #성수역
    zoom_start = 14,
    tiles="OpenStreetMap")

#icon basic
folium.Marker((37.54712311308356, 127.04721916917774),
              icon=folium.Icon(color='black',icon = "info=sign")
             ).add_to(m)

#성수역
folium.Marker([37.544564958079896, 127.05582307754338],
            popup = "<b>Subway</b>",
            tooltip = "<i>성수역</i>",
            icon=folium.Icon(
            color='red',
            icon_color = 'blue',
            icon = "cloud")
             ).add_to(m)


# Zerobase
folium.Marker([37.54558642069953,127.05729705810472],
              popup = "<a href='https://zero-base.co.kr/' target=_'blink'>제로베이스</a>",
              tooltip = "<i>Zerobase</i>"
             ).add_to(m)

#Icon custom
folium.Marker(
    location=[37.54035903907497, 127.06913328776446],
    popup="<i>건대입구역</i>",
    tooltip = "Icon custom",
    icon=folium.Icon(color="purple",
                    icon_color="green",
                    icon="fa-brands fa-instagram",
                    angle=50,
                    prefix = "fa" #fa, glyphicon
                    )).add_to(m)

m

Out[353]:

Make this Notebook Trusted to load map: File -> Trust Notebook

folium.ClickForMarker()¶

지도위에 마우스 클릭했을 때 마커 생성

In [354]:

m = folium.Map(
    location=[37.544564958079896, 127.05582307754338], #성수역
    zoom_start = 14,
    tiles="OpenStreetMap")

m.add_child(folium.ClickForMarker(pop="ClickForMarker"))

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[354], line 6
      1 m = folium.Map(
      2     location=[37.544564958079896, 127.05582307754338], #성수역
      3     zoom_start = 14,
      4     tiles="OpenStreetMap")
----> 6 m.add_child(folium.ClickForMarker(pop="ClickForMarker"))

TypeError: __init__() got an unexpected keyword argument 'pop'

folium.LatLngPopup()¶

지도를 마우스로 클릭했을 때 경도 정보 반환

In [ ]:

m = folium.Map(
location = [37.5301, 127.0403],
            zoom_start = 14,
            title="OpenStreetMap")

m.add_child(folium.LatLngPopup())

folium.Circle(), folium.CircleMarker()¶

In [ ]:

m = folium.Map(
    location=[37.544564958079896, 127.05582307754338], #성수역
    zoom_start = 14,
    tiles="OpenStreetMap")

#circle
folium.Circle([37.5366,127.0099],
              radius = 100,
              fill = False,
             color = "#eb9e34",
             fill_color = "red",
             popup = "Circle popup",
             tooltip = "Circle Tooltip").add_to(m)

#CircleMarker
folium.CircleMarker([37.5444, 127.0404],
              radius = 100,
              fill = False,
             color = "#34ebc6",
             fill_color = "blue",
             popup = "Circle popup",
             tooltip = "Circle Tooltip").add_to(m)
m

folium.Choropleth()¶

In [ ]:

import json

In [ ]:

state_data = pd.read_csv("../data/02. US_Unemployment_Oct2012.csv")
state_data.tail(2)

In [ ]:

m = folium.Map([43, -102], zoom_start = 3)
folium.Choropleth(
    geo_data = "../data/02. us-states.json", # 경계선 좌표값이 담긴 데이터
    data=state_data, #Series or DataFrame
    columns = ["State", "Unemployment"],
    key_on = "feature.id",
    fill_color = "BuPu",
    fill_opacity = 1, #0~1
    line_opacity = 1,
    legend_name = "Uneployment rate (%)"
).add_to(m)

m

아파트 유형 지도 시각화¶

공공데이터포털
http://www.data.go.kr/data/15066101/fileData.do

In [ ]:

import pandas as pd

In [ ]:

df = pd.read_csv("../data/서울특별시 동작구_주택유형별 위치 정보 및 세대수 현황_20220818.csv", encoding = "cp949")
df.info()

In [ ]:

# Nan 데이터 제거
df = df.dropna()
df.info()

In [ ]:

df = df.reset_index(drop=False)
df

In [ ]:

df = df.rename(columns = {"연번 ":"연번","분류 ":"분류"})
del df["연번"]
del df["index"]
df

In [ ]:

row.위도

In [ ]:

df.describe()

In [ ]:

# folium

m = folium.Map(location=[37.50589466533131, 126.93450729567374], zoom_start = 13)
for idx, row in df.iterrows():
    # location
    lat, lng = row.위도, row.경도
    
    #Marker
    folium.Marker(
        location = [lat, lng],
        popup = row.주소,
        tooltip = row.분류,
        icon = folium.Icon(
            icon = "home",
            color = "lightred" if row.세대수 >=199 else "lightblue",
            icon_color = "darkred" if row.세대수 >=199 else "darkblue"
        )
    ).add_to(m)
    
    #CircleMarker
    folium.Circle(
        location = [lat,lng],
        radius = row.세대수 * 0.2,  #세대수에 비례하는 원크기
        fill = True,
        color = "blakc" if row.세대수 >518 else "green",
        fill_color = "black" if row.세대수 >518 else "green",
        opacity = 1
    ).add_to(m)
m

10. 서울시 범죄 현황에 대한 지도 시각화¶

In [357]:

import json
import folium
import pandas as pd

In [358]:

crime_anal_norm = pd.read_csv("../data/02. crime_in_Seoul_final.csv", index_col=0, encoding = "utf-8")
# index_col : 특정 컬럼을 인덱스로 지정

geo_path="../data/02. skorea_municipalities_geo_simple.json"
geo_str = json.load(open(geo_path, encoding = "utf-8"))

In [359]:

crime_anal_norm.tail(2)

Out[359]:

	살인	강도	강간	절도	폭력	강간검거율	강도검거율	살인검거율	절도검거율	폭력검거율	인구수	CCTV	범죄	검거
구별
중구	0.214286	0.205128	0.383721	0.585671	0.407957	74.747475	87.5	100.0	42.511628	89.707865	134593	1023	0.359353	78.893394
중랑구	0.571429	0.358974	0.317829	0.460637	0.580125	91.463415	100.0	87.5	62.211709	85.714286	412780	916	0.457799	85.377882

In [370]:

#살인발생 건수 지도 시각화

my_map = folium.Map(
    location = [37.552, 126.982],
    zoom_start = 11,
    tiles = "Stamen Toner"
)

folium.Choropleth(
    geo_data = geo_str, # 우리나라 경계선 좌표값이 담긴 데이터
    data = crime_anal_norm["살인"],
    colums = [crime_anal_norm.index, crime_anal_norm["살인"]],
    key_on = "feature.id",
    fill_color = "PuRd",
    fill_opacity = 0.7,
    line_opacity = 0.2,
    legend_name="정규화된 살인 발생 건수"
).add_to(my_map)

my_map

Out[370]:

Make this Notebook Trusted to load map: File -> Trust Notebook

In [369]:

# 5대 범주 발생 건수 지도 시각화

my_map = folium.Map(
    location = [37.552, 126.982],
    zoom_start = 11,
    tiles = "Stamen Toner"
)

folium.Choropleth(
    geo_data = geo_str, # 우리나라 경계선 좌표값이 담긴 데이터
    data = crime_anal_norm["범죄"],
    colums = [crime_anal_norm.index, crime_anal_norm["범죄"]],
    key_on = "feature.id",
    fill_color = "PuRd",
    fill_opacity = 0.7,
    line_opacity = 0.2,
    legend_name="정규화된 5대 범죄 발생 건수"
).add_to(my_map)

my_map

Out[369]:

Make this Notebook Trusted to load map: File -> Trust Notebook

In [383]:

# 인구 대비 범죄 발생 건수

tmp_criminal = crime_anal_norm["범죄"] / crime_anal_norm["인구수"]

my_map = folium.Map(
    location = [37.552, 126.982],
    zoom_start = 11,
    tiles = "Stamen Toner"
)

folium.Choropleth(
    geo_data = geo_str, # 우리나라 경계선 좌표값이 담긴 데이터
    data = tmp_criminal,
    colums = [crime_anal_norm.index, tmp_criminal],
    key_on = "feature.id",
    fill_color = "PuRd",
    fill_opacity = 0.7,
    line_opacity = 0.2,
    legend_name="인구 대비 범죄 발생 건수",
).add_to(my_map)

my_map

Out[383]:

Make this Notebook Trusted to load map: File -> Trust Notebook

In [375]:

# 경찰서별 정보를 범죄발생과 함께 정리

crime_anal_station = pd.read_csv(
    "../data/02. crime_in_Seoul_raw.csv", encoding ="utf-8"
)

crime_anal_station.head()

Out[375]:

	구분	강간검거	강간발생	강도검거	강도발생	살인검거	살인발생	절도검거	절도발생	폭력검거	폭력발생	구별	lat	lng
0	강남	269.0	339.0	26.0	24.0	3.0	3.0	1129.0	2438.0	2096.0	2336.0	강남구	37.509435	127.066958
1	강동	152.0	160.0	13.0	14.0	5.0	4.0	902.0	1754.0	2201.0	2530.0	강동구	37.528511	127.126822
2	강북	159.0	217.0	4.0	5.0	6.0	7.0	672.0	1222.0	2482.0	2778.0	강북구	37.637197	127.027305
3	강서	239.0	275.0	10.0	10.0	10.0	9.0	1070.0	1952.0	2768.0	3204.0	양천구	37.539783	126.829997
4	관악	264.0	322.0	10.0	12.0	7.0	6.0	937.0	2103.0	2707.0	3235.0	관악구	37.474395	126.951349

In [378]:

col = ["살인검거","강도검거","강간검거","절도검거","폭력검거"]
tmp = crime_anal_station[col] / crime_anal_station[col].max() #정규화 0-1
crime_anal_station["검거"] = np.mean(tmp, axis = 1) #numpy에서 axis=1은 행(가로), pandas에서 axis=1은 열(세로)
crime_anal_station.tail(2)

Out[378]:

	구분	강간검거	강간발생	강도검거	강도발생	살인검거	살인발생	절도검거	절도발생	폭력검거	폭력발생	구별	lat	lng	검거
29	중부	96.0	141.0	3.0	3.0	2.0	2.0	485.0	1204.0	1164.0	1335.0	중구	37.563617	126.989652	0.277182
30	혜화	64.0	101.0	6.0	6.0	2.0	2.0	379.0	988.0	842.0	972.0	종로구	37.571968	126.998957	0.240065

In [388]:

# 경찰서 위치 마커 표시

my_map = folium.Map(
    location = [37.5502, 126.982], zoom_start = 11
)

folium.Choropleth(
    geo_data=geo_str,
    data=crime_anal_norm["범죄"],
    columns = [crime_anal_norm.index, crime_anal_norm["범죄"]],
    key_on = "feature.id",
    fill_color = "PuRd",
    fill_opacity = 0.7,
    line_opacity = 0.2,
    legend_name="정규화된 5대 범죄 발생 건수"
).add_to(my_map)

for idx, rows in crime_anal_station.iterrows():
    folium.CircleMarker(
        location=[rows["lat"], rows["lng"]],
        radius = rows["검거"] * 50,
        popup = rows["구분"] + ":" + "%.2f" % rows["검거"],
        color = "#3186cc",
        fill = True,
        fill_color = "#3186cc"
    ).add_to(my_map)
    
my_map

Out[388]:

Make this Notebook Trusted to load map: File -> Trust Notebook

11. 서울시 범죄 현황 발생 장소 분석¶

In [393]:

# 추가 검증

crime_loc_row = pd.read_csv(
    "../data/02. crime_in_Seoul_location.csv", thousands =",", encoding = "euc-kr")
crime_loc_row.tail(2)

Out[393]:

	범죄명	장소	발생건수
63	폭력	금융기관	42
64	폭력	기타	26382

In [394]:

crime_loc_row.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 65 entries, 0 to 64
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   범죄명     65 non-null     object
 1   장소      65 non-null     object
 2   발생건수    65 non-null     int64 
dtypes: int64(1), object(2)
memory usage: 1.6+ KB

In [395]:

crime_loc_row["범죄명"].unique()

Out[395]:

array(['살인', '강도', '강간.추행', '절도', '폭력'], dtype=object)

In [396]:

crime_loc_row["장소"].unique()

Out[396]:

array(['아파트, 연립 다세대', '단독주택', '노상', '상점', '숙박업소, 목욕탕', '유흥 접객업소', '사무실',
       '역, 대합실', '교통수단', '유원지 ', '학교', '금융기관', '기타'], dtype=object)

In [402]:

crime_loc = crime_loc_row.pivot_table(
    crime_loc_row, index = "장소", columns = "범죄명", aggfunc = [np.sum])

crime_loc.columns = crime_loc.columns.droplevel([0, 1])
crime_loc

Out[402]:

범죄명	강간.추행	강도	살인	절도	폭력
장소
교통수단	691	0	0	457	222
금융기관	2	1	1	1081	42
기타	2128	67	65	21734	26382
노상	986	87	22	9329	24535
단독주택	395	15	30	2241	3579
사무실	132	8	1	682	1229
상점	95	34	1	4403	852
숙박업소, 목욕탕	389	9	4	828	303
아파트, 연립 다세대	284	18	12	1504	2839
역, 대합실	181	0	0	356	272
유원지	59	2	2	367	424
유흥 접객업소	398	13	8	2035	2645
학교	33	0	0	400	203

In [406]:

col = ["살인", "강도", "강간", "절도", "폭력"]
crime_loc_norm = crime_loc / crime_loc.max()  #정규화
crime_loc_norm.head()

Out[406]:

범죄명	강간.추행	강도	살인	절도	폭력
장소
교통수단	0.324718	0.000000	0.000000	0.021027	0.008415
금융기관	0.000940	0.011494	0.015385	0.049738	0.001592
기타	1.000000	0.770115	1.000000	1.000000	1.000000
노상	0.463346	1.000000	0.338462	0.429235	0.929990
단독주택	0.185620	0.172414	0.461538	0.103110	0.135661

In [407]:

crime_loc_norm["종합"] = np.mean(crime_loc_norm, axis = 1)
crime_loc_norm.tail(2)

Out[407]:

범죄명	강간.추행	강도	살인	절도	폭력	종합
장소
유흥 접객업소	0.187030	0.149425	0.123077	0.093632	0.100258	0.130684
학교	0.015508	0.000000	0.000000	0.018404	0.007695	0.008321

In [410]:

crime_loc_norm_sort = crime_loc_norm.sort_values("종합", ascending = False)

def drawGraph():
    plt.figure(figsize = (10, 10))
    sns.heatmap(
    crime_loc_norm_sort, annot = True, fmt = "f", linewidths = 0.5, cmap = "RdPu"
    )
    
    plt.title("범죄 발생 장소")
    plt.show()
    
drawGraph()

EDA) 셀프 주유소 가격 분석 (0)	2023.03.10
EDA) Selenium 기초 (0)	2023.03.10
EDA) 네이버 영화순위 시각화 (0)	2023.03.10
EDA) 웹크롤링 기초 예제 - 시카고 샌드위치 (0)	2023.03.10
EDA) 서울시 인구수 및 CCTV 개수 시각화 (0)	2023.03.10

ABOUT ME

02. Analysis Seoul Crime¶

1. 프로젝트 개요¶

2. 데이터 개요¶

Pandas pivot table¶

values 설정¶

columns 설정¶

3. 서울시 범죄 현황 데이터 정리¶

4. Python 모듈 설치¶

pip 명령¶

conda 명령¶

5. Google Maps API 설치¶

Python 반복문¶

간단한 for문 예제¶

조금 복잡한 for문 예제¶

위 코드를 한 줄로: list comprehension¶

Pandas에 잘 맞춰진 반복문용 명령 iterrows()¶

6. Google Maps를 이용한 데이터 정리¶

7. 구별 데이터로 정리¶

8. 범죄 데이터 정렬을 위한 데이터 정리¶

np.mean()¶

Seaborn()¶

예제1: seaborn 기초¶

예제2: seaborn tips data¶

예제3: flights data¶

예제4: iris data¶

예제5: anscombe data¶

9. 서울시 범죄현황 데이터 시각화¶

folium¶

folium.map()¶

save(path)¶

tiles option¶

folium.Marker()¶

folium.icon()¶

folium.ClickForMarker()¶

folium.LatLngPopup()¶

folium.Circle(), folium.CircleMarker()¶

folium.Choropleth()¶

아파트 유형 지도 시각화¶

10. 서울시 범죄 현황에 대한 지도 시각화¶

11. 서울시 범죄 현황 발생 장소 분석¶

'EDA' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바