Extract Data From JustDial using Selenium
Let us see how to extract data from Justdial using Selenium and Python. Justdial is a company that provides local search for different services in India over the phone, website and mobile apps. In this article we will be extracting the following data:
We can then save the data in a CSV file.
Approach:
- Import the following modules: webdriver from selenium, ChromeDriverManager, pandas, time and os.
- Use the driver.get() method and pass the link you want to get information from.
- Use the driver.find_elements_by_class_name() method and pass ‘store-details’.
- Instantiate empty lists to store the values.
- Iterate the StoreDetails and start fetching the individual details that are required.
- Create a user-defined function strings_to_number() to convert the extracted string to numbers.
- Display the details and save them as a CSV file according to the requirements.
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(ChromeDriverManager().install())
import pandas as pd
import time
import os
def strings_to_num(argument):
switcher = {
'dc' : '+' ,
'fe' : '(' ,
'hg' : ')' ,
'ba' : '-' ,
'acb' : '0' ,
'yz' : '1' ,
'wx' : '2' ,
'vu' : '3' ,
'ts' : '4' ,
'rq' : '5' ,
'po' : '6' ,
'nm' : '7' ,
'lk' : '8' ,
'ji' : '9'
}
return switcher.get(argument, "nothing" )
storeDetails = driver.find_elements_by_class_name( 'store-details' )
nameList = []
addressList = []
numbersList = []
for i in range ( len (storeDetails)):
name = storeDetails[i].find_element_by_class_name( 'lng_cont_name' ).text
address = storeDetails[i].find_element_by_class_name( 'cont_sw_addr' ).text
contactList = storeDetails[i].find_elements_by_class_name( 'mobilesv' )
myList = []
for j in range ( len (contactList)):
myString = contactList[j].get_attribute( 'class' ).split( "-" )[ 1 ]
myList.append(strings_to_num(myString))
nameList.append(name)
addressList.append(address)
numbersList.append("".join(myList))
data = { 'Company Name' : nameList,
'Address' : addressList,
'Phone' : numbersList}
df = pd.DataFrame(data)
print (df)
df.to_csv( 'demo1.csv' , mode = 'a' , header = False )
|