The ruby on rails Application to scrape the link uploaded from CSV file and
find the occurance of link in particular page.
In the application user need to pass a csv and list of users email to whom the parsed CSV will be sent.
In the csv there will be three 2 column:
- refferal_link
- home_link
- and there values like below
First of all we will create the rails application
$ rails new scrape_data
$ cd scrape_data
Then we will genrate the UploadCsv module, run the below command
$ rails g scaffold UploadCsv generated_csv:string csv_file:string
That will create All the required model, controller and migrations for csv_file
Then we will start by first upload the file in DB
replace the below code in files app/views/upload_csvs/_form.html.erb
we added the below code to upload file in view
<%= form_with(model: upload_csv, local: true) do |form| %>
<% if upload_csv.errors.any? %>
<div id=”error_explanation”>
<h2><%= pluralize(upload_csv.errors.count, “error”) %> prohibited this upload_csv from being saved:</h2>
<ul>
<% upload_csv.errors.full_messages.each do |message| %>
<li><%= message %></li>
<% end %>
</ul>
</div>
<% end %>
<div class=”field”>
<%= form.label :csv_file %>
<%= form.file_field :csv_file %>
</div>
<div class=”actions”>
<%= form.submit %>
</div>
<% end %>
Then we will add the gem for upload a csv_file
add the below line in gem file
gem ‘carrierwave’, ‘~> 2.0’
$ bundle install
Then we will create the uploader in carrierwave
$ rails generate uploader Avatar
we will attach the uploader in model
app/models/upload_csv.rb
class UploadCsv < ApplicationRecord
mount_uploader :csv_file, AvatarUploader
end
before moving further just check your application is working
run below commands
$ rake db:create db:migrate
update the routes
Rails.application.routes.draw do
resources :upload_csvs
root ‘upload_csvs#index’
end
$ rails s
Then we will create a Job to read the CSV file and scrape the link from it
and genrated file will be save in generated_csv column of that records
for genearting the job we will do like below
$ rails generate job genrate_csv
add the below gem and run bundle install
gem ‘httparty’
gem ‘nokogiri’
then we will replace the code with below
class GenrateCsvJob < ApplicationJob
queue_as :default
def perform(upload_csv)
processed_csv(upload_csv)
file = Tempfile.open([“#{Rails.root}/public/generated_csv”, ‘.csv’]) do |csv|
csv << %w[referal_link home_link count]
@new_array.each do |new_array|
csv << new_array
end
file = “#{Rails.root}/public/product_data.csv”
headers = [‘referal_link’, ‘home_link’, ‘count’]
file = CSV.open(file, ‘w’, write_headers: true, headers: headers) do |writer|
@new_array.each do |new_array|
writer << new_array
end
upload_csv.update(generated_csv: file)
end
end
NotificationMailer.send_csv(upload_csv).deliver_now! if @new_array.present?
#need to genrate the mailer and follow the mailer steps
end
# Method to get the link count and stores in the array
def processed_csv(upload_csv)
@new_array = []
CSV.foreach(upload_csv.csv_file.path, headers: true, header_converters: :symbol) do |row|
row_map = row.to_h
page = HTTParty.get(row_map[:refferal_link])
page_parse = Nokogiri::HTML(page)
link_array = page_parse.css(‘a’).map { |link| link[‘href’] }
link_array_group = link_array.group_by(&:itself).map { |k, v| [k, v.length] }.to_h
@new_array.push([row_map[:refferal_link], row_map[:home_link], (link_array_group[row_map[:home_link]]).to_s])
end
end
end
Then we will attach the job after_create of upload_csvs and we will add the validation for csv_file require
please update the code of app/models/upload_csv.rb
class UploadCsv < ApplicationRecord
mount_uploader :csv_file, AvatarUploader
after_create :processed_csv
def processed_csv
GenrateCsvJob.perform_later(self)
end
end
then check after uploding file your scrape genrated file will be updated you can check generated csv
inside /scrape_data/public/product_data.csv
we can send through email by using below instruction
First of we will genrate the mailer
$ rails generate mailer NotificationMailer
update the code of app/mailers/notification_mailer.rb
def send_csv(upload_csv)
@greeting = ‘Hi’
attachments[‘parsed.csv’] = File.read(upload_csv.generated_csv)
mail(to: “[email protected]”, subject: ‘CSV is parsed succesfully.’)
end
end
please configure the mail configure also config/environments/development.rb or production.rb
add below lines in the file
config.action_mailer.default_url_options = { host: ‘https://sample-scrape.herokuapp.com/’ }
config.action_mailer.delivery_method = :smtp
config.action_mailer.smtp_settings = {
user_name: ‘[email protected]’,
password: ‘*******123456’,
domain: ‘gmail.com’,
address: ‘smtp.gmail.com’,
port: ‘587’,
authentication: :plain
}
config.action_mailer.raise_delivery_errors = false
and update the view also app/views/notification_mailer/send_csv.html.erb
<h1>CSV has been processed, Thanks!</h1>
<p>
<%= @greeting %>, Please check attachment to recieve the email
</p>
Thank you !