Thursday, 15 June 2017

Web Scraping

Web Scraping using Python

Hi all, i'll be telling how to start web scrapping in python. In this script we are using accuweather to scrap data, to check whats the current temperature of your city.


   We are going to use two module on this tutorial.

1) Beautiful soup is the module, that we will use for scrapping.
2) Requests is to get the data from webpage.

Make sure to check if Beautiful soup is installed in your system, as below. In case if it's not installed, google for how to install beautifulsoup and requests.

shanky@Unity:~$ pip list |grep beautifulsoup4
beautifulsoup4 (4.5.3)

Code:

#imports  

from bs4 import BeautifulSoup as bs
import requests

#requests.get() function connects to given url, and collects the web page data.
#change url with, go to accuweather, and select your city.
#example, if you were in banglore, URL would be "http://www.accuweather.com/en/in/bengaluru/204108/weather-forecast/204108"

page = requests.get("url")

#now we need to extract content from the data we just got. and parse it in html format. This will enable to use, search what we are looking for using html tags.

soup = bs(page.content, 'html.parser')

#Here we are searching for a class="large-temp" first instance in the HTML we just extracted. you can check HTML from website, just right click and choose Inspect element from your browser.
#get.text() collects the string, inside the searched tag.

x=soup.find_all('span', class_="large-temp")[0].get_text()

#here we need to use slicing to get the required data, also the string i got was encoded to UTF-8 format, so we need to encode it back to ascii.

print ("Current temperature is " + str(x[0:2].encode()) + "C")


This is it, for more information, contact me or research on BeautifulSoup for web scrapping, it packs way too many powerful functions.

FYI, i am using 
accuweather for educational purpose only. you need to modify the code as per the website, if you are using other website.

No comments:

Post a Comment