-
EDA) 웹크롤링 기초 예제 - 시카고 샌드위치EDA 2023. 3. 10. 22:21
03. Web Data 03. Web Data¶
1. BeautifulSoup for web data¶
BeautifulSoup Basic¶
install
-conda install -c anaconda beautifulsoup4 -pip install beautifulsoup4data
- test_first.html
In [1]:conda install -c anaconda beautifulsoup4
Collecting package metadata (current_repodata.json): ...working... done Solving environment: ...working... done # All requested packages already installed. Note: you may need to restart the kernel to use updated packages.
In [2]:# import from bs4 import BeautifulSoup
In [ ]:page = open("../data/03. bin.html","r").read() soup = BeautifulSoup(page, "html.parser") print(soup.prettify())
In [4]:# head 태그 확인 soup.head
Out[4]:<head> <title>Very Simple HTML Code by bin</title> </head>
In [5]:# body 태그 확인 soup.body
Out[5]:<body> <div> <p "first"="" class="inner-text first-item" id=""> Happy bin. <a href="http://www.pinkwink.kr" id="pw-link">pinkWink</a> </p> <p "second"="" class="inner-text-second-item" id=""> Happy Data Science <a href="http://www.python.org" id="py-link" target="_blink">Python</a> </p> </div> <p class="outer-text first-item" id="second"> <b> Data Science is funny</b> </p> <p class="outer-text"> <i>All I need is Love</i> </p> </body>In [6]:# p 태그 확인 # 처음 발견한 p 태그만 출력 soup.p
Out[6]:<p "first"="" class="inner-text first-item" id=""> Happy bin. <a href="http://www.pinkwink.kr" id="pw-link">pinkWink</a> </p>In [7]:soup.find("p")
Out[7]:<p "first"="" class="inner-text first-item" id=""> Happy bin. <a href="http://www.pinkwink.kr" id="pw-link">pinkWink</a> </p>In [8]:soup.find("p", class_="inner-text first-item")
Out[8]:<p "first"="" class="inner-text first-item" id=""> Happy bin. <a href="http://www.pinkwink.kr" id="pw-link">pinkWink</a> </p>In [9]:soup.find("p",{"class":"outer-text first-item"})
Out[9]:<p class="outer-text first-item" id="second"> <b> Data Science is funny</b> </p>
In [10]:soup.find("p",{"class":"outer-text first-item"}).text.strip()
Out[10]:'Data Science is funny'
In [11]:# 다중 조건 soup.find("p", {"class":"inner-text first-item", "id":"first"})
In [12]:# find_all(): 여러 개의 태그를 반환 # list 형태로 반환 soup.find_all("p")
Out[12]:[<p "first"="" class="inner-text first-item" id=""> Happy bin. <a href="http://www.pinkwink.kr" id="pw-link">pinkWink</a> </p>, <p "second"="" class="inner-text-second-item" id=""> Happy Data Science <a href="http://www.python.org" id="py-link" target="_blink">Python</a> </p>, <p class="outer-text first-item" id="second"> <b> Data Science is funny</b> </p>, <p class="outer-text"> <i>All I need is Love</i> </p>]In [13]:# 특정 태그 확인(class_) soup.find_all(class_ = "outer-text")
Out[13]:[<p class="outer-text first-item" id="second"> <b> Data Science is funny</b> </p>, <p class="outer-text"> <i>All I need is Love</i> </p>]
In [14]:# 특정 태그 확인(id) soup.find_all(id = "pw-link")
Out[14]:[<a href="http://www.pinkwink.kr" id="pw-link">pinkWink</a>]
In [15]:# 특정 태그 확인: 리스트 형태이기 때문에 [0] 해줘야 text 호출 soup.find_all(id = "pw-link")[0].text
Out[15]:'pinkWink'
In [16]:len(soup.find_all("p"))
Out[16]:4
In [17]:print(soup.find_all("p")[0].text) print(soup.find_all("p")[1].string) print(soup.find_all("p")[1].get_text())
Happy bin. pinkWink None Happy Data Science PythonIn [18]:# p 태그 리스트에서 텍스트 속성만 출력 for each_tag in soup.find_all("p"): #4개의 리스트 조회 print("=" * 50) print(each_tag.text)
================================================== Happy bin. pinkWink ================================================== Happy Data Science Python ================================================== Data Science is funny ================================================== All I need is LoveIn [19]:# a 태그에서 href 속성값에 있는 값 추출 links = soup.find_all("a") links[0].get("href"), links[1]["href"]
Out[19]:('http://www.pinkwink.kr', 'http://www.python.org')In [20]:for each in links: text = each.get_text() href = each.get("href") # each["href"] print(text + "->" + href)
pinkWink->http://www.pinkwink.kr Python->http://www.python.org
BeautifulSoup 예제 1-1. 네이버 금융¶
In [21]:# import from urllib.request import urlopen from bs4 import BeautifulSoup import pandas as pd
In [ ]:url = "https://finance.naver.com/marketindex/" page = urlopen(url) # response = urlopen(url) # response.status # http 상태 코드 soup = BeautifulSoup(page, "html.parser") print(soup.prettify())
In [23]:# 1 soup.find_all("span", "value"), len(soup.find_all("span", "value"))
Out[23]:([<span class="value">1,321.00</span>, <span class="value">966.24</span>, <span class="value">1,398.54</span>, <span class="value">189.76</span>, <span class="value">136.1200</span>, <span class="value">1.0574</span>, <span class="value">1.1918</span>, <span class="value">105.3000</span>, <span class="value">75.72</span>, <span class="value">1592.97</span>, <span class="value">1834.6</span>, <span class="value">78022.8</span>], 12)
In [24]:# 2 soup.find_all("span", class_="value"), len(soup.find_all("span", "value"))
Out[24]:([<span class="value">1,321.00</span>, <span class="value">966.24</span>, <span class="value">1,398.54</span>, <span class="value">189.76</span>, <span class="value">136.1200</span>, <span class="value">1.0574</span>, <span class="value">1.1918</span>, <span class="value">105.3000</span>, <span class="value">75.72</span>, <span class="value">1592.97</span>, <span class="value">1834.6</span>, <span class="value">78022.8</span>], 12)
In [25]:# 3 soup.find_all("span", {"class":"value"}), len(soup.find_all("span", "value"))
Out[25]:([<span class="value">1,321.00</span>, <span class="value">966.24</span>, <span class="value">1,398.54</span>, <span class="value">189.76</span>, <span class="value">136.1200</span>, <span class="value">1.0574</span>, <span class="value">1.1918</span>, <span class="value">105.3000</span>, <span class="value">75.72</span>, <span class="value">1592.97</span>, <span class="value">1834.6</span>, <span class="value">78022.8</span>], 12)
In [26]:soup.find_all("span",{"class":"value"})[0].text,soup.find_all("span",{"class":"value"})[0].string,soup.find_all("span",{"class":"value"})[0].get_text()
Out[26]:('1,321.00', '1,321.00', '1,321.00')BeautifulSoup 예제 1-2 네이버 금융¶
- !pip install requests
- find, select_one : 단일 선택
- select, find_all : 다중 선택
In [27]:import requests from bs4 import BeautifulSoup
In [ ]:url = "https://finance.naver.com/marketindex/" response = requests.get(url) # requests.get(), requests.post() soup = BeautifulSoup(response.text, "html.parser") print(soup.prettify())
In [29]:# soup.find_all("li","on") # id -> "#" # class -> "." # > -> 바로 하위 조회 exchangeList = soup.select("#exchangeList > li") len(exchangeList)
Out[29]:4
In [30]:title = exchangeList[0].select_one(".h_lst").text exchange = exchangeList[0].select_one(".value").text change = exchangeList[0].select_one(".change").text try: updown = exchangeList[0].select_one("div.head_info.point_dn > .blind").text except Exception as e: updown = exchangeList[0].select_one("div.head_info.point_up > .blind").text # 띄어쓰기: class 속성값이 2개라는 뜻 = "."으로 연결 title, exchange,change, updown
Out[30]:('미국 USD', '1,321.00', '1.00', '상승')In [31]:findmethod = soup.find_all("ul", id="exchangeList") findmethod[0].find_all("span","value")
Out[31]:[<span class="value">1,321.00</span>, <span class="value">966.24</span>, <span class="value">1,398.54</span>, <span class="value">189.76</span>]
In [32]:# link baseurl = "http://finance.naver.com" baseurl + exchangeList[0].select_one("a").get("href")
Out[32]:'http://finance.naver.com/marketindex/exchangeDetail.naver?marketindexCd=FX_USDKRW'
In [33]:# 4개 데이터 수집 exchange_datas = [] baseUrl = "http://finance.naver.com" for item in exchangeList: try: updown = item.select_one("div.head_info.point_dn > .blind").text except Exception as e: updown = item.select_one("div.head_info.point_up > .blind").text data = { "title": item.select_one(".h_lst").text, "exchange" : item.select_one(".value").text, "change" : item.select_one(".change").text, "updown" : updown, "url" : baseurl + item.select_one("a").get("href")} exchange_datas.append(data) # list 안의 dic 형태가 피봇 만들기 좋음 df = pd.DataFrame(exchange_datas) df.to_excel("./naverfinance.xlsx", encoding = "utf-8")
C:\Users\admin\miniconda3\envs\ds_study\lib\site-packages\pandas\util\_decorators.py:211: FutureWarning: the 'encoding' keyword is deprecated and will be removed in a future version. Please take steps to stop the use of 'encoding' return func(*args, **kwargs)
BeautifulSoup 예제2 - 위키백과 문서 정보 가져오기¶
In [ ]:import urllib from bs4 import BeautifulSoup from urllib.request import urlopen, Request html = "https://ko.wikipedia.org/wiki/{search_words}" req = Request(html.format(search_words = urllib.parse.quote("여명의_눈동자"))) #글자를 url로 인코딩 response = urlopen(req) response.status soup = BeautifulSoup(response, "html.parser") print(soup.prettify())
Python List 데이터형¶
In [35]:colors = ["red","blue","green"] b = colors b
Out[35]:['red', 'blue', 'green']
In [36]:b[1] = "black" b
Out[36]:['red', 'black', 'green']
In [37]:c = colors.copy() c
Out[37]:['red', 'black', 'green']
In [38]:c[1] = "yellow" c
Out[38]:['red', 'yellow', 'green']
In [39]:colorsOut[39]:['red', 'black', 'green']
- list형을 반복문(for)에 적용
In [40]:for color in colors: print(color)
red black green
- in 명령으로 조건문(if)에 적용
In [41]:if "white" in colors: print("True")
In [42]:movies = ["라라랜드","킹스맨","어벤져스","다크나이트"] movies.append("러브레터") movies
Out[42]:['라라랜드', '킹스맨', '어벤져스', '다크나이트', '러브레터']
- pop: 리스트 제일 뒤부터 자료를 하나씩 삭제
In [43]:movies.pop()
Out[43]:'러브레터'
In [44]:moviesOut[44]:['라라랜드', '킹스맨', '어벤져스', '다크나이트']
- extend: 제일 뒤에 자료 하나 추가
In [45]:movies.extend(["위대한쇼맨","인셉션"]) movies
Out[45]:['라라랜드', '킹스맨', '어벤져스', '다크나이트', '위대한쇼맨', '인셉션']
- remove: 자료 삭제
In [46]:movies.remove("킹스맨") movies
Out[46]:['라라랜드', '어벤져스', '다크나이트', '위대한쇼맨', '인셉션']
- insert: 원하는 위치에 자료 삽입
In [47]:movies.insert(1, 9.6) movies.insert(2,["봄날","그해여름"])
- isinstance: 자료형 True/False
In [48]:isinstance(movies, str) isinstance(movies, list)
Out[48]:True
In [49]:for each_item in movies: if isinstance(each_item, list): for nested_item in each_item: print("nested_item", nested_item) else: print("each_item", each_item)
each_item 라라랜드 each_item 9.6 nested_item 봄날 nested_item 그해여름 each_item 어벤져스 each_item 다크나이트 each_item 위대한쇼맨 each_item 인셉션
2. 시카고 맛집 데이터 분석 - 개요¶
- https://www.chicagomag.com/chicago-magazine/november-2012/best-sandwiches-chicago/
- chicago magazine the 50 best sandwiches
최종목표 총 51개 페이지에서 각 가게의 정보 가져온다
- 가게이름
- 대표메뉴
- 대표메뉴의 가격
- 가게주소
3. 시카고 맛집 데이터 분석 - 메인페이지¶
In [ ]:from urllib.request import Request, urlopen from fake_useragent import UserAgent from bs4 import BeautifulSoup url_base = "https://www.chicagomag.com/chicago-magazine/" url_sub = "november-2012/best-sandwiches-chicago/" url = url_base + url_sub #랜덤한 환경 설정 # ua = UserAgent() # ua.ie req = Request(url, headers={"user-agent": "Chrome"}) #403 에러 방지(어떤 웹브라우저로 접근하는지) # {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36"} : 정석 html = urlopen(req) soup = BeautifulSoup(html, "html.parser") print(soup.prettify())
In [ ]:soup.find_all("div", class_="sammy"), len(soup.find_all("div", class_="sammy"))
In [52]:tmp_one = soup.find_all("div","sammy")[0] tmp_one, type(tmp_one) # beautiful soup 객체로 만들어져 있음
Out[52]:(<div class="sammy" style="position: relative;"> <div class="sammyRank">1</div> <div class="sammyListing"><a href="/Chicago-Magazine/November-2012/Best-Sandwiches-in-Chicago-Old-Oak-Tap-BLT/"><b>BLT</b><br/> Old Oak Tap<br/> <em>Read more</em> </a></div> </div>, bs4.element.Tag)
In [53]:tmp_one.find(class_="sammyRank")
Out[53]:<div class="sammyRank">1</div>
In [54]:tmp_one.find(class_="sammyRank").get_text() # = tmp_one.select_one(".sammyRank").text
Out[54]:'1'
In [55]:tmp_one.find("div",{"class":"sammyListing"}).text tmp_one.select_one(".sammyListing").get_text()
Out[55]:'BLT\nOld Oak Tap\nRead more '
In [56]:tmp_one.find("a")["href"] tmp_one.select_one("a").get("href")
Out[56]:'/Chicago-Magazine/November-2012/Best-Sandwiches-in-Chicago-Old-Oak-Tap-BLT/'
In [57]:# 쓸 데 없는 코드 제거 import re tmp_string = tmp_one.find(class_="sammyListing").get_text() re.split(("\n|\r\n"), tmp_string)
Out[57]:['BLT', 'Old Oak Tap', 'Read more ']
In [58]:print(re.split(("\n|\r\n"), tmp_string)[0]) print(re.split(("\n|\r\n"), tmp_string)[1])
BLT Old Oak Tap
In [59]:from urllib.parse import urljoin url_base = "https://www.chicagomag.com/chicago-magazine/" #필요한 내용을 담을 빈 리스트 # 리스트로 하나씩 칼럼을 만들고, DataFrame으로 합칠 예정 rank = [] main_menu = [] cafe_name = [] url_add = [] list_soup = soup.find_all("div","sammy") #soup.select(".sammy") for item in list_soup: rank.append(item.find(class_="sammyRank").get_text()) tmp_string = item.find(class_ = "sammyListing").get_text() main_menu.append(re.split(("\n|\r\n"), tmp_string)[0]) cafe_name.append(re.split(("\n|\r\n"), tmp_string)[1]) url_add.append(urljoin(url_base, item.find("a")["href"]))
In [60]:len(rank), len(main_menu), len(cafe_name), len(url_add)
Out[60]:(50, 50, 50, 50)
In [61]:import pandas as pd data = { "Rank" : rank, "Menu" : main_menu, "Cafe" : cafe_name, "URL" : url_add, } df = pd.DataFrame(data) df.tail()
Out[61]:Rank Menu Cafe URL 45 46 Kufta Chickpea https://www.chicagomag.com/Chicago-Magazine/No... 46 47 Debbie’s Egg Salad The Goddess and Grocer https://www.chicagomag.com/Chicago-Magazine/No... 47 48 Beef Curry Zenwich https://www.chicagomag.com/Chicago-Magazine/No... 48 49 Le Végétarien Toni Patisserie https://www.chicagomag.com/Chicago-Magazine/No... 49 50 The Gatsby Phoebe’s Bakery https://www.chicagomag.com/Chicago-Magazine/No... In [62]:# 컬럼 순서 변경 df = pd.DataFrame(data, columns = ["Rank","Cafe","Menu","URL"]) df.tail()
Out[62]:Rank Cafe Menu URL 45 46 Chickpea Kufta https://www.chicagomag.com/Chicago-Magazine/No... 46 47 The Goddess and Grocer Debbie’s Egg Salad https://www.chicagomag.com/Chicago-Magazine/No... 47 48 Zenwich Beef Curry https://www.chicagomag.com/Chicago-Magazine/No... 48 49 Toni Patisserie Le Végétarien https://www.chicagomag.com/Chicago-Magazine/No... 49 50 Phoebe’s Bakery The Gatsby https://www.chicagomag.com/Chicago-Magazine/No... In [63]:df.to_csv( "../data/03. best_sandwiches_list_chicago.csv", sep=",", encoding="utf-8" )
4. 시카고 맛집 데이터 분석 - 하위페이지¶
In [64]:# requirements import pandas as pd from urllib.request import urlopen, Request from fake_useragent import UserAgent from bs4 import BeautifulSoup
In [65]:df = pd.read_csv("../data/03. best_sandwiches_list_chicago.csv", index_col=0) df.tail()
Out[65]:Rank Cafe Menu URL 45 46 Chickpea Kufta https://www.chicagomag.com/Chicago-Magazine/No... 46 47 The Goddess and Grocer Debbie’s Egg Salad https://www.chicagomag.com/Chicago-Magazine/No... 47 48 Zenwich Beef Curry https://www.chicagomag.com/Chicago-Magazine/No... 48 49 Toni Patisserie Le Végétarien https://www.chicagomag.com/Chicago-Magazine/No... 49 50 Phoebe’s Bakery The Gatsby https://www.chicagomag.com/Chicago-Magazine/No... In [66]:df["URL"][0]
Out[66]:'https://www.chicagomag.com/Chicago-Magazine/November-2012/Best-Sandwiches-in-Chicago-Old-Oak-Tap-BLT/'
In [67]:req = Request(df["URL"][0], headers={"user-agent":"chrome"}) #url 주소값 요청 html = urlopen(req).read() soup_tmp = BeautifulSoup(html, "html.parser") soup_tmp.find("p", class_="addy") # soup_tip.select_one(".addy")
Out[67]:<p class="addy"> <em>$10. 2109 W. Chicago Ave., 773-772-0406, <a href="http://www.theoldoaktap.com/">theoldoaktap.com</a></em></p>
In [68]:# regular expression price_tmp = soup_tmp.find("p","addy").text price_tmp
Out[68]:'\n$10. 2109 W. Chicago Ave., 773-772-0406, theoldoaktap.com'
In [69]:import re re.split(".,",price_tmp)
Out[69]:['\n$10. 2109 W. Chicago Ave', ' 773-772-040', ' theoldoaktap.com']
In [70]:price_tmp = re.split(".,",price_tmp)[0] price_tmp
Out[70]:'\n$10. 2109 W. Chicago Ave'
In [71]:tmp = re.search("\$\d+\.(\d+)?", price_tmp).group() #출roup(): 값만 추출 price_tmp[len(tmp) + 2:] #가격 뒤 +2(". ")부터 끝까지 가져와라
Out[71]:'2109 W. Chicago Ave'
In [72]:from tqdm import tqdm price = [] address = [] for idx, row in tqdm(df.iterrows()): req = Request(row["URL"], headers={"user-agent":"chrome"}) html = urlopen(req).read() soup_tmp = BeautifulSoup(html, "html.parser") gettings = soup_tmp.find("p", class_="addy").get_text() price_tmp = re.split(".,", gettings)[0] tmp = re.search("\$\d+\.(\d+)?", price_tmp).group() price.append(tmp) address.append(price_tmp[len(tmp)+2:])
50it [01:22, 1.64s/it]
In [73]:price[:5]
Out[73]:['$10.', '$9.', '$9.50', '$9.40', '$10.']
In [74]:df["Price"] = price df["Address"] = address df.tail(2)
Out[74]:Rank Cafe Menu URL Price Address 48 49 Toni Patisserie Le Végétarien https://www.chicagomag.com/Chicago-Magazine/No... $8.75 65 E. Washington St 49 50 Phoebe’s Bakery The Gatsby https://www.chicagomag.com/Chicago-Magazine/No... $6.85 3351 N. Broadwa In [75]:df = df.loc[:, ["Rank","Cafe","Menu","Price","Address"]] #모든 행 선택, 열 순서 정렬 df.set_index("Rank", inplace=True) df.head()
Out[75]:Cafe Menu Price Address Rank 1 Old Oak Tap BLT $10. 2109 W. Chicago Ave 2 Au Cheval Fried Bologna $9. 800 W. Randolph St 3 Xoco Woodland Mushroom $9.50 445 N. Clark St 4 Al’s Deli Roast Beef $9.40 914 Noyes St 5 Publican Quality Meats PB&L $10. 825 W. Fulton Mkt In [76]:df.to_csv( "../data/03. best_sandwiches_list_chicago2.csv", sep=",", encoding="UTF-8", index=False)
In [77]:pd.read_csv("../data/03. best_sandwiches_list_chicago2.csv", index_col=0)
Out[77]:Menu Price Address Cafe Old Oak Tap BLT $10. 2109 W. Chicago Ave Au Cheval Fried Bologna $9. 800 W. Randolph St Xoco Woodland Mushroom $9.50 445 N. Clark St Al’s Deli Roast Beef $9.40 914 Noyes St Publican Quality Meats PB&L $10. 825 W. Fulton Mkt Hendrickx Belgian Bread Crafter Belgian Chicken Curry Salad $7.25 100 E. Walton St Acadia Lobster Roll $16. 1639 S. Wabash Ave Birchwood Kitchen Smoked Salmon Salad $10. 2211 W. North Ave Cemitas Puebla Atomica Cemitas $9. 3619 W. North Ave Nana Grilled Laughing Bird Shrimp and Fried Po’ Boy $17. 3267 S. Halsted St Lula Cafe Ham and Raclette Panino $11. 2537 N. Kedzie Blvd Ricobene’s Breaded Steak $5.49 Multiple location Frog n Snail The Hawkeye $14. 3124 N. Broadwa Crosby’s Kitchen Chicken Dip $10. 3455 N. Southport Ave Longman & Eagle Wild Boar Sloppy Joe $13. 2657 N. Kedzie Ave Bari Meatball Sub $4.50 1120 W. Grand Ave Manny’s Corned Beef $11.95 1141 S. Jefferson St Eggy’s Turkey Club $11.50 333 E. Benton Pl Old Jerusalem Falafel $6.25 1411 N. Wells St Mindy’s HotChocolate Crab Cake $15. 1747 N. Damen Ave Olga’s Delicatessen Chicken Schnitzel $5. 3209 W. Irving Park Rd Dawali Mediterranean Kitchen Shawarma $6. Multiple location Big Jones Toasted Pimiento Cheese $8. 5347 N. Clark St La Pane Vegetarian Panino $5.99 2954 W. Irving Park Rd Pastoral Cali Chèvre $7.52 Multiple location Max’s Deli Pastrami $11.95 191 Skokie Valley Rd Lucky’s Sandwich Co. The Fredo $7.50 Multiple location City Provisions Smoked Ham $12.95 1818 W. Wilson Ave Papa’s Cache Sabroso Jibarito $7. 2517 W. Division St Bavette’s Bar & Boeuf Shaved Prime Rib $21. 218 W. Kinzie St Hannah’s Bretzel Serrano Ham and Manchego Cheese $9.79 Multiple location La Fournette Tuna Salad $9.75 1547 N. Wells St Paramount Room Paramount Reuben $13. 415 N. Milwaukee Ave Melt Sandwich Shoppe The Istanbul $7.95 1840 N. Damen Ave Floriole Cafe & Bakery B.A.D. $9. 1220 W. Webster Ave First Slice Pie Café Duck Confit and Mozzarella $9. 5357 N. Ashland Ave Troquet Croque Monsieur $8. 1834 W. Montrose Ave Grahamwich Green Garbanzo $8. 615 N. State St Saigon Sisters The Hen House $7. Multiple location Rosalia’s Deli Tuscan Chicken $6. 241 N. York Rd Z&H MarketCafe The Marty $7.25 1323 E. 57th St Market House on the Square Whitefish $11. 655 Forest Ave Elaine’s Coffee Call Oat Bread, Pecan Butter, and Fruit Jam $6. Hotel Lincol Marion Street Cheese Market Cauliflower Melt $9. 100 S. Marion St Cafecito Cubana $5.49 26 E. Congress Pkwy Chickpea Kufta $8. 2018 W. Chicago Ave The Goddess and Grocer Debbie’s Egg Salad $6.50 25 E. Delaware Pl Zenwich Beef Curry $7.50 416 N. York St Toni Patisserie Le Végétarien $8.75 65 E. Washington St Phoebe’s Bakery The Gatsby $6.85 3351 N. Broadwa 5. 시카고 맛집 데이터 지도 시각화¶
In [78]:# requirements import folium import pandas as pd import numpy as np import googlemaps from tqdm import tqdm
In [79]:df = pd.read_csv("../data/03. best_sandwiches_list_chicago2.csv") df.tail(5)
Out[79]:Cafe Menu Price Address 45 Chickpea Kufta $8. 2018 W. Chicago Ave 46 The Goddess and Grocer Debbie’s Egg Salad $6.50 25 E. Delaware Pl 47 Zenwich Beef Curry $7.50 416 N. York St 48 Toni Patisserie Le Végétarien $8.75 65 E. Washington St 49 Phoebe’s Bakery The Gatsby $6.85 3351 N. Broadwa In [80]:gmaps_key = "AIzaSyBRwnhXCj6U8hJkACJjk3CdL1aLzh_Knso" gmaps = googlemaps.Client(key=gmaps_key)
In [81]:lat = [] lng = [] for idx, row in tqdm(df.iterrows()): if not row["Address"] == "Multiple location": target_name = row["Address"] + "," + "Chicago" # print(target_name) gmaps_output = gmaps.geocode(target_name) location_output = gmaps_output[0].get("geometry") lat.append(location_output["location"]["lat"]) lng.append(location_output["location"]["lng"]) else: lat.append(np.nan) lng.append(np.nan)
50it [00:07, 6.33it/s]
In [82]:len(lat), len(lng)
Out[82]:(50, 50)
In [83]:df["lat"] = lat df["lng"] = lng df
Out[83]:Cafe Menu Price Address lat lng 0 Old Oak Tap BLT $10. 2109 W. Chicago Ave 41.895558 -87.679967 1 Au Cheval Fried Bologna $9. 800 W. Randolph St 41.884639 -87.647590 2 Xoco Woodland Mushroom $9.50 445 N. Clark St 41.890523 -87.630783 3 Al’s Deli Roast Beef $9.40 914 Noyes St 41.878114 -87.629798 4 Publican Quality Meats PB&L $10. 825 W. Fulton Mkt 41.886604 -87.648536 5 Hendrickx Belgian Bread Crafter Belgian Chicken Curry Salad $7.25 100 E. Walton St 41.900250 -87.625078 6 Acadia Lobster Roll $16. 1639 S. Wabash Ave 41.859054 -87.625201 7 Birchwood Kitchen Smoked Salmon Salad $10. 2211 W. North Ave 41.910203 -87.682875 8 Cemitas Puebla Atomica Cemitas $9. 3619 W. North Ave 41.909758 -87.717677 9 Nana Grilled Laughing Bird Shrimp and Fried Po’ Boy $17. 3267 S. Halsted St 41.834530 -87.645649 10 Lula Cafe Ham and Raclette Panino $11. 2537 N. Kedzie Blvd 41.927621 -87.706792 11 Ricobene’s Breaded Steak $5.49 Multiple location 41.878114 -87.629798 12 Frog n Snail The Hawkeye $14. 3124 N. Broadwa 41.938442 -87.644644 13 Crosby’s Kitchen Chicken Dip $10. 3455 N. Southport Ave 41.945114 -87.663697 14 Longman & Eagle Wild Boar Sloppy Joe $13. 2657 N. Kedzie Ave 41.930056 -87.707034 15 Bari Meatball Sub $4.50 1120 W. Grand Ave 41.891297 -87.655545 16 Manny’s Corned Beef $11.95 1141 S. Jefferson St 41.867853 -87.641929 17 Eggy’s Turkey Club $11.50 333 E. Benton Pl 41.885269 -87.618484 18 Old Jerusalem Falafel $6.25 1411 N. Wells St 41.908055 -87.634312 19 Mindy’s HotChocolate Crab Cake $15. 1747 N. Damen Ave 41.913695 -87.677127 20 Olga’s Delicatessen Chicken Schnitzel $5. 3209 W. Irving Park Rd 41.953712 -87.708450 21 Dawali Mediterranean Kitchen Shawarma $6. Multiple location NaN NaN 22 Big Jones Toasted Pimiento Cheese $8. 5347 N. Clark St 41.979450 -87.667957 23 La Pane Vegetarian Panino $5.99 2954 W. Irving Park Rd 41.954159 -87.702711 24 Pastoral Cali Chèvre $7.52 Multiple location 41.878114 -87.629798 25 Max’s Deli Pastrami $11.95 191 Skokie Valley Rd 42.156707 -87.803635 26 Lucky’s Sandwich Co. The Fredo $7.50 Multiple location 41.878114 -87.629798 27 City Provisions Smoked Ham $12.95 1818 W. Wilson Ave 41.965278 -87.675542 28 Papa’s Cache Sabroso Jibarito $7. 2517 W. Division St 41.902726 -87.690228 29 Bavette’s Bar & Boeuf Shaved Prime Rib $21. 218 W. Kinzie St 41.889368 -87.634949 30 Hannah’s Bretzel Serrano Ham and Manchego Cheese $9.79 Multiple location 41.878114 -87.629798 31 La Fournette Tuna Salad $9.75 1547 N. Wells St 41.910526 -87.634377 32 Paramount Room Paramount Reuben $13. 415 N. Milwaukee Ave 41.889636 -87.644840 33 Melt Sandwich Shoppe The Istanbul $7.95 1840 N. Damen Ave 41.915050 -87.677805 34 Floriole Cafe & Bakery B.A.D. $9. 1220 W. Webster Ave 41.921852 -87.659212 35 First Slice Pie Café Duck Confit and Mozzarella $9. 5357 N. Ashland Ave 41.979710 -87.669344 36 Troquet Croque Monsieur $8. 1834 W. Montrose Ave 41.961712 -87.675816 37 Grahamwich Green Garbanzo $8. 615 N. State St 41.892915 -87.627826 38 Saigon Sisters The Hen House $7. Multiple location NaN NaN 39 Rosalia’s Deli Tuscan Chicken $6. 241 N. York Rd 41.960813 -87.939445 40 Z&H MarketCafe The Marty $7.25 1323 E. 57th St 41.791322 -87.593861 41 Market House on the Square Whitefish $11. 655 Forest Ave 41.706823 -87.616186 42 Elaine’s Coffee Call Oat Bread, Pecan Butter, and Fruit Jam $6. Hotel Lincol 41.915287 -87.634389 43 Marion Street Cheese Market Cauliflower Melt $9. 100 S. Marion St 41.886362 -87.802230 44 Cafecito Cubana $5.49 26 E. Congress Pkwy 41.875821 -87.626457 45 Chickpea Kufta $8. 2018 W. Chicago Ave 41.896113 -87.677857 46 The Goddess and Grocer Debbie’s Egg Salad $6.50 25 E. Delaware Pl 41.898979 -87.627393 47 Zenwich Beef Curry $7.50 416 N. York St 41.910583 -87.940488 48 Toni Patisserie Le Végétarien $8.75 65 E. Washington St 41.883106 -87.625438 49 Phoebe’s Bakery The Gatsby $6.85 3351 N. Broadwa 41.943163 -87.644507 In [84]:mapping = folium.Map(location=[41.8781136, -87.6297982], zoom_start = 11) for idx, row in df.iterrows(): if not row["Address"] == "Multiple location": folium.Marker( location = [row["lat"], row["lng"]], popup = row["Cafe"], tooltip = row["Menu"], icon = folium.Icon(icon="coffee",prefix="fa") ).add_to(mapping) mapping
Out[84]:Make this Notebook Trusted to load map: File -> Trust Notebook'EDA' 카테고리의 다른 글
EDA) 셀프 주유소 가격 분석 (0) 2023.03.10 EDA) Selenium 기초 (0) 2023.03.10 EDA) 네이버 영화순위 시각화 (0) 2023.03.10 EDA) 서울시 범죄현황 시각화 (0) 2023.03.10 EDA) 서울시 인구수 및 CCTV 개수 시각화 (0) 2023.03.10