-
EDA) 네이버 API 활용EDA 2023. 3. 10. 23:04
06. Naver API 06. Naver API¶
1. 네이버 API 사용 등록¶
- 네이버 개발자 센터
- Application
- 어플리케이션 등록
- 사용 API
- 환경추가 -WEB설정 -http://localhost
- client ID: YcEkBF8FxV1SDWUqscNW
- Client Secret: OQCYXHHUbR
- https://developers.naver.com/apps/#/myapps/YcEkBF8FxV1SDWUqscNW/overview
- urllib: http 프로토콜에 따라서 서버의 요청/응답을 처리하기 위한 모듈
- urllib.request : 클라이언트의 요청을 처리하는 모듈
- urllib.parse : url 주소에 대한 분석
In [ ]:import os import sys import urllib.request client_id = "YcEkBF8FxV1SDWUqscNW" client_secret = "OQCYXHHUbR" encText = urllib.parse.quote("파이썬") url = "https://openapi.naver.com/v1/search/blog?query=" + encText # JSON 결과 # url = "https://openapi.naver.com/v1/search/blog.xml?query=" + encText # XML 결과 request = urllib.request.Request(url) request.add_header("X-Naver-Client-Id",client_id) request.add_header("X-Naver-Client-Secret",client_secret) response = urllib.request.urlopen(request) rescode = response.getcode() if(rescode==200): response_body = response.read() print(response_body.decode('utf-8')) # 글자로 읽을 경우, decode utf-8 설정 else: print("Error Code:" + rescode)
In [2]:response, response.getcode(), response.code, response.status
Out[2]:(<http.client.HTTPResponse at 0x22a9e1a5280>, 200, 200, 200)
책 정보 검색¶
In [ ]:import os import sys import urllib.request client_id = "YcEkBF8FxV1SDWUqscNW" client_secret = "OQCYXHHUbR" encText = urllib.parse.quote("파이썬") url = "https://openapi.naver.com/v1/search/book?query=" + encText # JSON 결과 # url = "https://openapi.naver.com/v1/search/blog.xml?query=" + encText # XML 결과 request = urllib.request.Request(url) request.add_header("X-Naver-Client-Id",client_id) request.add_header("X-Naver-Client-Secret",client_secret) response = urllib.request.urlopen(request) rescode = response.getcode() if(rescode==200): response_body = response.read() print(response_body.decode('utf-8')) else: print("Error Code:" + rescode)
영화 정보 검색¶
In [4]:import os import sys import urllib.request client_id = "YcEkBF8FxV1SDWUqscNW" client_secret = "OQCYXHHUbR" encText = urllib.parse.quote("파이썬") url = "https://openapi.naver.com/v1/search/movie?query=" + encText # JSON 결과 # url = "https://openapi.naver.com/v1/search/blog.xml?query=" + encText # XML 결과 request = urllib.request.Request(url) request.add_header("X-Naver-Client-Id",client_id) request.add_header("X-Naver-Client-Secret",client_secret) response = urllib.request.urlopen(request) rescode = response.getcode() if(rescode==200): response_body = response.read() print(response_body.decode('utf-8')) # 글자로 읽을 경우, decode utf-8 설정 else: print("Error Code:" + rescode)
{ "lastBuildDate":"Thu, 09 Mar 2023 22:31:05 +0900", "total":1, "start":1, "display":1, "items":[ { "title":"<b>파이썬<\/b> 앤 가드", "link":"https:\/\/movie.naver.com\/movie\/bi\/mi\/basic.nhn?code=152070", "image":"https:\/\/ssl.pstatic.net\/imgmovie\/mdi\/mit110\/1520\/152070_P01_145336.jpg", "subtitle":"PYTHON AND GUARD", "pubDate":"2015", "director":"안톤 디아코프|", "actor":"", "userRating":"0.00" } ] }카페 글 검색¶
In [ ]:import os import sys import urllib.request client_id = "YcEkBF8FxV1SDWUqscNW" client_secret = "OQCYXHHUbR" encText = urllib.parse.quote("파이썬") url = "https://openapi.naver.com/v1/search/cafearticle?query=" + encText # JSON 결과 # url = "https://openapi.naver.com/v1/search/blog.xml?query=" + encText # XML 결과 request = urllib.request.Request(url) request.add_header("X-Naver-Client-Id",client_id) request.add_header("X-Naver-Client-Secret",client_secret) response = urllib.request.urlopen(request) rescode = response.getcode() if(rescode==200): response_body = response.read() print(response_body.decode('utf-8')) # 글자로 읽을 경우, decode utf-8 설정 else: print("Error Code:" + rescode)
백과사전(encyc) 검색¶
In [ ]:import os import sys import urllib.request client_id = "YcEkBF8FxV1SDWUqscNW" client_secret = "OQCYXHHUbR" encText = urllib.parse.quote("파이썬") url = "https://openapi.naver.com/v1/search/encyc?query=" + encText # JSON 결과 # url = "https://openapi.naver.com/v1/search/blog.xml?query=" + encText # XML 결과 request = urllib.request.Request(url) request.add_header("X-Naver-Client-Id",client_id) request.add_header("X-Naver-Client-Secret",client_secret) response = urllib.request.urlopen(request) rescode = response.getcode() if(rescode==200): response_body = response.read() print(response_body.decode('utf-8')) # 글자로 읽을 경우, decode utf-8 설정 else: print("Error Code:" + rescode)
3. 상품 검색¶
- 몰스킨
In [ ]:import os import sys import urllib.request client_id = "YcEkBF8FxV1SDWUqscNW" client_secret = "OQCYXHHUbR" encText = urllib.parse.quote("몰스킨") url = "https://openapi.naver.com/v1/search/shop?query=" + encText # JSON 결과 # url = "https://openapi.naver.com/v1/search/blog.xml?query=" + encText # XML 결과 request = urllib.request.Request(url) request.add_header("X-Naver-Client-Id",client_id) request.add_header("X-Naver-Client-Secret",client_secret) response = urllib.request.urlopen(request) rescode = response.getcode() if(rescode==200): response_body = response.read() print(response_body.decode('utf-8')) # 글자로 읽을 경우, decode utf-8 설정 else: print("Error Code:" + rescode)
(1) gen_search_url()¶
In [8]:def gen_search_url(api_node, search_text, start_num, disp_num): base = "https://openapi.naver.com/v1/search" node = "/" + api_node + ".json" param_query = "?query=" + urllib.parse.quote(search_text) param_start = "&start=" + str(start_num) #연산하기 위해 str param_disp = "&display=" + str(disp_num) return base + node + param_query + param_start + param_disp
In [9]:gen_search_url("shop","몰스킨",10,3)
Out[9]:'https://openapi.naver.com/v1/search/shop.json?query=%EB%AA%B0%EC%8A%A4%ED%82%A8&start=10&display=3'
(2) get_result_onepage()¶
In [10]:import json import datetime def get_result_onpage(url): request = urllib.request.Request(url) request.add_header request = urllib.request.Request(url) request.add_header("X-Naver-Client-Id",client_id) request.add_header("X-Naver-Client-Secret",client_secret) response = urllib.request.urlopen(request) print("[%s] Url Request Success" %datetime.datetime.now()) #현재시간이 print문으로 들어감 return json.loads(response.read().decode("utf8"))
In [11]:datetime.datetime.now()
Out[11]:datetime.datetime(2023, 3, 9, 22, 31, 6, 427260)
In [12]:url = gen_search_url("shop","몰스킨",1,5) one_result = get_result_onpage(url)
[2023-03-09 22:31:06.602064] Url Request Success
In [13]:one_result #json파일이 dic형태로 담겨있음
Out[13]:{'lastBuildDate': 'Thu, 09 Mar 2023 22:31:06 +0900', 'total': 37727, 'start': 1, 'display': 5, 'items': [{'title': '<b>몰스킨</b> 노트 가죽 하드커버 감성 고급 업무용 이쁜 심플', 'link': 'https://search.shopping.naver.com/gate.nhn?id=82526953942', 'image': 'https://shopping-phinf.pstatic.net/main_8252695/82526953942.7.jpg', 'lprice': '28800', 'hprice': '', 'mallName': '베스트펜', 'productId': '82526953942', 'productType': '2', 'brand': '몰스킨', 'maker': '', 'category1': '생활/건강', 'category2': '문구/사무용품', 'category3': '노트/수첩', 'category4': '노트'}, {'title': '빈폴 BEAN POLE 카키 <b>몰스킨</b> 프렌치 워크 재킷 291119', 'link': 'https://search.shopping.naver.com/gate.nhn?id=35704136246', 'image': 'https://shopping-phinf.pstatic.net/main_3570413/35704136246.20230305082320.jpg', 'lprice': '203620', 'hprice': '', 'mallName': '네이버', 'productId': '35704136246', 'productType': '1', 'brand': '빈폴', 'maker': '', 'category1': '패션의류', 'category2': '남성의류', 'category3': '재킷', 'category4': ''}, {'title': '<b>Moleskine</b> 2023년 데일리 플래너 12M 포켓 하드 커버 3 5 x 5 5 - <b>몰스킨</b>', 'link': 'https://search.shopping.naver.com/gate.nhn?id=36557437115', 'image': 'https://shopping-phinf.pstatic.net/main_3655743/36557437115.20221216090445.jpg', 'lprice': '25620', 'hprice': '', 'mallName': '네이버', 'productId': '36557437115', 'productType': '1', 'brand': '몰스킨', 'maker': '', 'category1': '생활/건강', 'category2': '문구/사무용품', 'category3': '다이어리/플래너', 'category4': '다이어리'}, {'title': '빈폴 22FW <b>몰스킨</b> 프렌치 워크 재킷 BC2911C19P', 'link': 'https://search.shopping.naver.com/gate.nhn?id=38120555265', 'image': 'https://shopping-phinf.pstatic.net/main_3812055/38120555265.20230305164101.jpg', 'lprice': '205870', 'hprice': '', 'mallName': '네이버', 'productId': '38120555265', 'productType': '1', 'brand': '빈폴', 'maker': '', 'category1': '패션의류', 'category2': '남성의류', 'category3': '재킷', 'category4': ''}, {'title': '2023년 <b>몰스킨</b> 하드커버 다이어리(데일리, 위클리, 한정판)', 'link': 'https://search.shopping.naver.com/gate.nhn?id=84904377827', 'image': 'https://shopping-phinf.pstatic.net/main_8490437/84904377827.1.jpg', 'lprice': '20000', 'hprice': '', 'mallName': '안네프랑크', 'productId': '84904377827', 'productType': '2', 'brand': '몰스킨', 'maker': '몰스킨', 'category1': '생활/건강', 'category2': '문구/사무용품', 'category3': '다이어리/플래너', 'category4': '다이어리'}]}In [14]:one_result["items"][0]["title"]
Out[14]:'<b>몰스킨</b> 노트 가죽 하드커버 감성 고급 업무용 이쁜 심플'
In [15]:one_result["items"][0]["link"]
Out[15]:'https://search.shopping.naver.com/gate.nhn?id=82526953942'
In [16]:one_result["items"][0]["lprice"]
Out[16]:'28800'
In [17]:import pandas as pd def get_fields(json_data): title = [ each["title"] for each in json_data["items"]] link = [ each["link"] for each in json_data["items"]] lprice = [ each["lprice"] for each in json_data["items"]] mall_name = [ each["mallName"] for each in json_data["items"]] result_pd = pd.DataFrame({ "title":title, "link":link, "lprice":lprice, "mall":mall_name }, columns = ["title","lprice","link","mall"]) return result_pd
In [18]:get_fields(one_result)
Out[18]:title lprice link mall 0 <b>몰스킨</b> 노트 가죽 하드커버 감성 고급 업무용 이쁜 심플 28800 https://search.shopping.naver.com/gate.nhn?id=... 베스트펜 1 빈폴 BEAN POLE 카키 <b>몰스킨</b> 프렌치 워크 재킷 291119 203620 https://search.shopping.naver.com/gate.nhn?id=... 네이버 2 <b>Moleskine</b> 2023년 데일리 플래너 12M 포켓 하드 커버 3 ... 25620 https://search.shopping.naver.com/gate.nhn?id=... 네이버 3 빈폴 22FW <b>몰스킨</b> 프렌치 워크 재킷 BC2911C19P 205870 https://search.shopping.naver.com/gate.nhn?id=... 네이버 4 2023년 <b>몰스킨</b> 하드커버 다이어리(데일리, 위클리, 한정판) 20000 https://search.shopping.naver.com/gate.nhn?id=... 안네프랑크 (4) delete_tag()¶
In [19]:def delete_tag(input_str): input_str = input_str.replace("<b>","") input_str = input_str.replace("</b>","") return input_str
In [20]:import pandas as pd def get_fields(json_data): title = [ delete_tag(each["title"]) for each in json_data["items"]] link = [ each["link"] for each in json_data["items"]] lprice = [ each["lprice"] for each in json_data["items"]] mall_name = [ each["mallName"] for each in json_data["items"]] result_pd = pd.DataFrame({ "title":title, "link":link, "lprice":lprice, "mall":mall_name }, columns = ["title","lprice","link","mall"]) return result_pd
In [21]:get_fields(one_result)
Out[21]:title lprice link mall 0 몰스킨 노트 가죽 하드커버 감성 고급 업무용 이쁜 심플 28800 https://search.shopping.naver.com/gate.nhn?id=... 베스트펜 1 빈폴 BEAN POLE 카키 몰스킨 프렌치 워크 재킷 291119 203620 https://search.shopping.naver.com/gate.nhn?id=... 네이버 2 Moleskine 2023년 데일리 플래너 12M 포켓 하드 커버 3 5 x 5 5... 25620 https://search.shopping.naver.com/gate.nhn?id=... 네이버 3 빈폴 22FW 몰스킨 프렌치 워크 재킷 BC2911C19P 205870 https://search.shopping.naver.com/gate.nhn?id=... 네이버 4 2023년 몰스킨 하드커버 다이어리(데일리, 위클리, 한정판) 20000 https://search.shopping.naver.com/gate.nhn?id=... 안네프랑크 In [22]:url = gen_search_url("shop","몰스킨",1,5) json_result = get_result_onpage(url) pd_result = get_fields(json_result)
[2023-03-09 22:31:07.714247] Url Request Success
In [23]:pd_resultOut[23]:title lprice link mall 0 몰스킨 노트 가죽 하드커버 감성 고급 업무용 이쁜 심플 28800 https://search.shopping.naver.com/gate.nhn?id=... 베스트펜 1 빈폴 BEAN POLE 카키 몰스킨 프렌치 워크 재킷 291119 203620 https://search.shopping.naver.com/gate.nhn?id=... 네이버 2 Moleskine 2023년 데일리 플래너 12M 포켓 하드 커버 3 5 x 5 5... 25620 https://search.shopping.naver.com/gate.nhn?id=... 네이버 3 빈폴 22FW 몰스킨 프렌치 워크 재킷 BC2911C19P 205870 https://search.shopping.naver.com/gate.nhn?id=... 네이버 4 2023년 몰스킨 하드커버 다이어리(데일리, 위클리, 한정판) 20000 https://search.shopping.naver.com/gate.nhn?id=... 안네프랑크 (5) actMain()¶
In [24]:for n in range(1, 1000, 100): print(n)
1 101 201 301 401 501 601 701 801 901
In [25]:result_mol = [] for n in range(1, 1000, 100): url = gen_search_url("shop","몰스킨", n, 100) json_result = get_result_onpage(url) pd_result = get_fields(json_result) result_mol.append(pd_result) result_mol = pd.concat(result_mol)
[2023-03-09 22:31:08.199193] Url Request Success [2023-03-09 22:31:08.507452] Url Request Success [2023-03-09 22:31:08.822472] Url Request Success [2023-03-09 22:31:09.118675] Url Request Success [2023-03-09 22:31:09.551888] Url Request Success [2023-03-09 22:31:09.871345] Url Request Success [2023-03-09 22:31:10.203726] Url Request Success [2023-03-09 22:31:10.526274] Url Request Success [2023-03-09 22:31:10.845780] Url Request Success [2023-03-09 22:31:11.182225] Url Request Success
In [26]:result_mol.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 1000 entries, 0 to 99 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 title 1000 non-null object 1 lprice 1000 non-null object 2 link 1000 non-null object 3 mall 1000 non-null object dtypes: object(4) memory usage: 39.1+ KB
In [27]:result_mol['lprice'] = result_mol['lprice'].astype("float") result_mol.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 1000 entries, 0 to 99 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 title 1000 non-null object 1 lprice 1000 non-null float64 2 link 1000 non-null object 3 mall 1000 non-null object dtypes: float64(1), object(3) memory usage: 39.1+ KB
(5) to_excel()¶
In [28]:!pip install xlsxwriterRequirement already satisfied: xlsxwriter in c:\users\admin\miniconda3\envs\ds_study\lib\site-packages (3.0.8)
In [29]:writer = pd.ExcelWriter("../data/06_molskin_diary_in_naver_shop.xlsx", engine="xlsxwriter") result_mol.to_excel(writer,sheet_name="Sheet1") workbook = writer.book worksheet = writer.sheets["Sheet1"] worksheet.set_column("A:A", 4) worksheet.set_column("B:B", 60) worksheet.set_column("C:C", 10) worksheet.set_column("D:D", 10) worksheet.set_column("E:E", 50) worksheet.set_column("F:F", 10) worksheet.conditional_format("C2:C1001", {"type":"3_color_scale"}) #크기에 따라 색 지정 writer.save()
C:\Users\admin\AppData\Local\Temp\ipykernel_11812\3658816746.py:13: FutureWarning: save is not part of the public API, usage can give unexpected results and will be removed in a future version writer.save()
In [32]:!dir "../data/06_molskin_diary_in_naver_shop.xlsx"
C 드라이브의 볼륨에는 이름이 없습니다. 볼륨 일련 번호: 0874-116F C:\Users\admin\Documents\ds_study\data 디렉터리 2023-03-09 오후 10:31 72,240 06_molskin_diary_in_naver_shop.xlsx 1개 파일 72,240 바이트 0개 디렉터리 48,928,006,144 바이트 남음(6) 시각화¶
In [35]:import matplotlib.pyplot as plt import seaborn as sns from matplotlib import rc %matplotlib inline rc("font",family="Malgun Gothic")
In [39]:plt.figure(figsize=(15, 6)) sns.countplot( x=result_mol["mall"], data=result_mol, palette="RdYlGn", order = result_mol["mall"].value_counts().index) #갯수세주기 plt.xticks(rotation="vertical") plt.show()
'EDA' 카테고리의 다른 글
EDA) 인구소멸위기 지역 시각화, 카르토그램 (0) 2023.03.10 EDA) 셀프 주유소 가격 분석 (0) 2023.03.10 EDA) Selenium 기초 (0) 2023.03.10 EDA) 네이버 영화순위 시각화 (0) 2023.03.10 EDA) 웹크롤링 기초 예제 - 시카고 샌드위치 (0) 2023.03.10