So, as a follow up on web scraping, I would like to continue on something that is very useful and powerful once you have collected data via web scraping.
Don’t get scared now.
Automation sounds scarier or more complex than what it actually is. I’ll give you a few examples to get more familiar with the word automation:
Automation basically is, to let your computer (or device) do something automatically, (duh!). A basic example is automatically updating your apps on your phone as soon as a new update is available. Another example of automation is updating your Iphone with a new IOS version, or computer when there is an update available. If that happens on your phone it’s most likely because somewhere in the settings of your phone you switched on that option. Probably in a very user friendly interface, quite straight forward.
Behind the scenes it’s nothing more than writing a script that says the following:
IF ‘update …’ is available AND there is an agreement to automatically update, run the update.
This is probably the most basic example. On top of that, automation can become more complex. You can have a list of exceptions that can all be included. This is pretty much the back end when you switch it on or off in your interface on your phone. To make something like that, you write a script and when it runs, it runs through a checklist of everything that it needs to have before it actually does something.
Updating your Iphone
So, only when you have given permission to automatically do that (check), you are connected to wifi (check), you are connected to an adapter (check).
So when there is an update available, and you gave permission to automatically update, and you connected to wifi but NOT plugged in, it will not update. Because it has run through a script, but one of the boxes that needed to be ticked, is not ticked, so it will not update your phone. And this is only an example of automation. The possibilities are endless to do this (web scraping, updating your laptop, downloading a csv file, uploading something online). Every automation might come with it’s own complexities or instructions. Maybe you need an API key, or it needs to be uploaded into a database.
Anyway, I’m not going to make it more complex than it is. Because it’s not. We’ll stick to automating web scraping for now. Most important is that you’ll get the gist of it, and once you have had, google is your best friend to find the right solution for your specific issue.
Once you have written your script, tested it, and it works, add library time to your script if you haven’t done so yet. With time, you’re able to open, set and run the script specifically when your want it. Once a day, week, month, or multiple times a day, week, month, whatsoever.
#example run every 60 seconds
time.sleep(60) #scripts ‘sleeps’ for 60 seconds before it runs again.
#example2 run every 8 hours
time.sleep(60*60*8) #runs every 8 hours
After you have run the time.sleep() and you have set it to every 8 hours (or whatever youre preference is), you can add a variable to end the timer after it has been running twice for example. You’ll do the following:
time.sleep(60*60*8) #runs every 8 hours
End_timer = time.time() * 2 #runs 2
While time.time() < End_timer:
This makes sure it stops after running twice (in 8 hours time if it’s the follow up on my script earlier).
Now, if you start up your computer every morning and run it, it will run twice that day. But what about the day after? You start up your computer and run this script manually, again. Not a big deal for one script, but if your have multiple different ones, different purposes, different times, different days, it’s becoming quite tedious starting your day with this. And it’s not necessary at all!
Also if you scrape a lot of data, you want to be careful with running your scripts too much. So think about it if you really want or need to run it every hour and whether you want to spend that time on it too. Especially if you scrape a lot of data, advanced systems recognize it as bots and block your scripts from websites and you just don’t want to end up being blocked for that reason.
Luckily both mac and windows have a task scheduler or task automator. To be able to run your script you need to set up a text file, and save that as a .bat. I’ll take you through the steps.
1.Create a .bat file.
This is necessary because in a .bat file you can put a list of commands that need to be processed You can do this by opening a text editor and place the following information in there:
- File path of your Python application
- File path of your Python script
You need those two because first your computer needs to go to your python program to be able to actually run your python script.
The file path of your python application is basically the path there you have installed your python on your computer. So, for example: C:\Users\…\Anaconda4\python.exe.
The file path for your python script is basically where you have saved your python file on your computer. So, for example: C:\Users\…\Desktop\automate_webscrape.py.
Once you have both, copy-paste them into your text file with “brackets” around them, and save the file as a .bat file.
2. Once that’s done, open your task scheduler on your computer.
Create a basic task in the task scheduler and follow the steps. Set it up according to your preferences, daily, weekly, monthly. Once you’re at: “What action do you want the task to perform”, select “start a program” and select your .bat file. Once you have set this up, it should perform time on the given time, when your computer is on. Obviously, to know whether you set it up correctly, test it ;).
Hope I gave you a bit more of an insight in the world of automation. Of course, this is just a small example of the possibilities that are out there, but it does show that it doesn’t have to be that complex, or expensive.
Any questions? Don’t hesitate to contact me.