Automatically backing up your files on a web server and uploading them to Google Drive using a Python script

Theviyanthan Krishnamohan
11 min readJun 17, 2019

This was originally published on my blog: https://www.thearmchaircritic.org/tech-journals/automatically-backing-up-your-files-on-a-web-server-and-uploading-them-to-google-drive-using-a-python-script/

Regularly backing up your web server content, or any content for that matter, is essential to avoid data loss in case of a malware attack or accidental deletion. If your website is hosted on a web host that doesn’t provide an automatic backup service or if you have self-hosted your website in a virtual machine running on a cloud, then manually taking backups regularly can be tedious.

Functionalities

To address this problem, I decided to create a python script to fully automate the process. Since my blogs, including this one, are all running on Docker containers, all that I need to do is to create an archive of the directory mapped to the containers — which contains my static files as well as the database.

Once the directory is archived, I need to store the archived file in a safe location. With Google Drive, you get 15GB of free cloud storage space, and I thought that will be an ideal place to stash my backups. Additionally, I wanted to be notified of the status of the backup process. In case of an error, I should be able to address it quickly. If the backup succeeds, knowing it has succeeded will assure me that the process is working as it is supposed to do.

Before we begin to start coding, let’s take a look at how our script is supposed to function.

  1. Create an archive of the directory or file.
  2. Authenticate the script with the Google Drive API service.
  3. Check for any previous backups.
  4. If there is more than one backup, then delete all the backups except the latest one.
  5. Upload the archive to Google Drive.
  6. Delete the archive file from the web server.
  7. Send a notification to your mobile phone

Getting started with Google Drive API

To use Google Drive API, we need a service token from Google. Even though you can access the API service using an API key and access tokens, to do so you need a web interface through which a user can sign in to their user account and authorize access to their account. But this will make our simple script unnecessarily complex. Our script will only be used by one user, so we can directly grant access to our Google Drive.

To that end, we need to create a service token. Before creating a service token, we need to have a Google Console Project. So, head over to https://console.cloud.google.com and click on the combo box displaying the names of your projects next to the Google Cloud Platform logo. In the modal window that appears, click on New Project.

In the New Project page, enter a name for your project and click on Create. I have named my project “AutoBackup”.

Once done, select the name of your project from the combo box on the top bar.

In the side menu, hover over IAM & admin and select Service Accounts from the menu.

Then click on Create Service Account. Give a name and an id, and click on Create. I have named my account “PythonScript”.

In the next page, click on Role, and select Owner from Project. This will give you full access to all the resources in the Project.

In the next screen, click Create Key and then select JSON. Once you click on Create, the created token file will be downloaded to your computer.

Keep this file safe and secure because you won’t be able to download this file again, and if this file is compromised, then anyone can access your project using the compromised token.

Then, click on Done and you will be able to see the email ID of your service account.

Take a note of this as you will need it later.

Creating an archive

To archive our directory, we will be using the bz2 algorithm since that gives us the best performance. Python has a module called tarfile that helps us archive files and directories.

First, we need to import it into our script.

import tarfile

Let’s now create a function that would accept a directory path as an argument and return the name of the backup archive created.

def archive(dir):    print("Archiving directory %s."%dir)    now=datetime.datetime.now().isoformat().replace(':','_').split(".")[0]    fileName="backup_"+now+".tar.bz2"    with tarfile.open(fileName,"w:bz2") as tar:        tar.add(dir)        print("Directory successfully archived. Archive name: %s."%fileName)        return fileName

Since the backup file will have the created date and time in its name, we should also import the datetime module.

Here, we are storing the current date and time in a variable and then concatenating that to “backup_” to get the name of the archive file.

Then, using the with statement in Python, we are opening a tar file with bz2 compression (with the tar.bz2 extension) and adding the directory to that file. Once the archiving is successful, the name of the file is returned by the function.

Now, we have our backup file.

Creating a Google Drive Service

We need to create a Google Drive Service to upload our file to Google Drive. So, let’s create a function that would accept the path to the token file we downloaded from Google Project Console as an argument and return the created service object.

First, you need to install the Google API Python Client using pip.

$ pip install --upgrade google-api-python-client

Then, import the following modules into the script.

from google.oauth2 import service_account

from googleapiclient.discovery import build

def createDriveService(token):    SCOPES=['https://www.googleapis.com/auth/drive']    SERVICE_ACCOUNT_FILE=token    credentials = service_account.Credentials.from_service_account_file(SERVICE_ACCOUNT_FILE, scopes=SCOPES)    return build('drive','v3',credentials=credentials)

Here, we specify the scope of our service as drive so that we can access our Google Drive. Then we create a credential object using the token file, and pass that credential object into the build method to get our service object. We can now access our Google Drive programmatically.

Removing old backups

As you may have already guessed, we are going to create another function that will accept the service object as an argument.

We will search our drive for archives beginning with the name “backup”. We will then store the id of these files as keys, and the names as values in a dictionary. If there is more than one file, we will create a list of names and sort the list in the ascending order. The last member of the list will be the latest file since the names of our files will all have their created date and time. Then, we will delete all the files except the latest one.

def clearPastBackups(service):    page=None    filesObj={}    while True:        files=service.files().list(q="mimeType='application/gzip' and name contains 'backup'",pageToken=page, fields="nextPageToken, files(id,name)").execute()        for file in files.get('files',[]):            filesObj[file.get('id')]=file.get('name')        page=files.get('nextPageToken',None)        if page is None:            break    if not(not filesObj or len(filesObj)<2):        print("Two or more previous backups found.")        latest=sorted(list(filesObj.values()))[len(filesObj)-1]        for l in sorted(list(filesObj.values())):            print(l)        print ("Backup to be kept: %s."%latest)        print("Deleting all but the latest backup...")        for file in filesObj:            if filesObj[file]!=latest:                service.files().delete(fileId=file).execute()                print("Backup named %s deleted."%filesObj[file])

We use service.files().list() to search for files in our drive. The mimeType application/gzip specifies that we are searching for archive files. If there are more files, then Google might split the list of files into pages and return only the first page. The while loop uses the page token to iterate over all the pages so that at the end of the loop we will have all the matching files.

The rest of the code, I believe, is self-explanatory.

Uploading our archive file

When we upload a file using our service token, the file gets uploaded into the drive associated with our project account (not to our personal Google Drive). There is no graphical way to access it as far as I know. Since, ideally, we would want to download the files whenever we need them without having to write scripts, we need a way to access such files through a web interface.

An easy hack is to create a folder in your personal Google Drive and share it with the service account using the service account’s email address (the email address you were asked to note down earlier).

Here, I have created a folder called backup and shared it with my service account’s email address. After you share the folder, we need to obtain the folder’s id. You can do it by navigating into the folder. Once you are inside the folder, take a look at the URL. The random string after the folders query parameter is the id.

Now, we need to upload our backup file into the folder, so that we will be able to access it through our Google Drive. Another collateral advantage of this is that your personal Drive will not lose any storage space.

def upload(fileName,service):    print("Beginning backup upload...")    media=MediaFileUpload(fileName,mimetype="application/gzip", resumable=True)    file=service.files().create(body={'name':fileName,'parents':['parentID']},media_body=media,fields='id').execute()    print("Backup uploaded. Online backup file ID is %s."%file.get('id'))    print("Setting backup permissions...")    def callback(request_id, response, exception):        if exception:            # Handle error            print (exception)        else:            print ("Permission Id: %s" % response.get('id'))    batch = service.new_batch_http_request(callback=callback)    user_permission = {        'type': 'user',        'role': 'writer',        'emailAddress': 'you@email.com'    }    batch.add(service.permissions().create(            fileId=file.get('id'),            body=user_permission,            fields='id',    ))    batch.execute()

The upload function accepts the path of the directory and the service object as the arguments.

Import the following module to your script.

from googleapiclient.http import MediaFileUpload

We will create a media object using this module. Here, we pass the directory path, mimeType, and set resumable to true. Since our backup file is likely to be huge, setting this to true ensures, in case of an interruption, that the upload can be resumed from where it was interrupted.

Then, we pass the name of the file, and the media object to the create method to upload the file. To upload the file into the folder that we created, insert the id of the folder into the parents array.

To be able to view and edit the file in our personal Google Drive, we need to assign ourselves permission to this file. To do so, create a user permission object and specify the email address of your Google Drive. Then, pass the file id and the user permission object into the add method of our batch object and call execute to grant permissions.

Deleting our archive file

Now, it’s time we mop up our workspace. You wouldn’t want the archive files you created in your web server to accumulate over time, hogging your storage space. So, let’s delete the file we created. To delete a file we need to import the os module.

import os

Then, create a function that accepts the file path as an argument and deletes the file.

def clean(fileName):    print("Deleting temporary files...")    os.remove(fileName)    print("Temporary files deleted. Backup complete!")

Sending notifications

To send a push notification to a mobile device, we need to use Zapier (https://zapier.com/) and Pushbullet(https://www.pushbullet.com/). Zapier helps you integrate various online services and Pushbullet lets you send and receive push notifications. Create an account in both of these sites.

In Zapier, create a new Zap. Zapier accepts a trigger for a Zap and then performs a predefined action when that Zap is triggered. Add a Webhook as a trigger which can be triggered by sending an API request to the API endpoint provided by Zapier.

In the next screen, choose Catch Hook and click on Continue to get the API endpoint. Now, we need to send a GET request to this endpoint so that Zapier can correctly identify the parameters we send.

To send http requests in Python, you need a module called requests.

pip install requests

Let’s install it and then import it.

import requests

Then, let’s use the following function to send a GET request.

def notify(status,desc,now):    print("Sending you a notification!")    url="zapierRestURL"    if status:        print(requests.get(url=url, params={"status":"Backup successful!","body":"A new backup was made to Google drive on %s successfully. The backup name is: %s."%(now,desc)}))    else:        print(requests.get(url=url,params={"status":"Backup failed!","body":"Backup failed on %s. Error: %s."%(now,desc)}))

The status parameter denotes whether the backup process was a success or not. The desc parameter gives you either the error message or the name of the backup file. The now parameter gives you the current date and time.

Paste your Zapier API endpoint in place of “zapierRestURL”. Now, to configure our Zap, we need to run this method with mock data. So, call this method setting status to true, desc, and now to a random string.

notify(True,"file name","today")

You should get the following output.

Now, go to your Zapier page and click on Ok, I did this.

On the next screen, you should see your status parameter value and the body parameter value. Now, that we have configured our trigger, it’s time we configure our action.

Click on Continue and then add an action. Search for Pushbullet and select Send a Note.

Click on Connect an Account and connect your Pushbullet account to your Zapier account.

Click on Test to see if it succeeds and then move to the next screen.

Set a notification title by clicking on the button to the far right of the title text box. Choose Querystring Status. Choose the body string for the notification message option. Click on Continue.

Now, download and install Pushbullet on your phone from the Play Store and log into your account. Click on Send Test to Pushbullet on the webpage. You should get a push notification on your phone now.

Click on Finish on the webpage. Now, your notification flow is complete.

Now, call the functions in the following order in your Python script to take a backup.

  1. createDriveService
  2. archive
  3. clearPastBackups
  4. upload
  5. clean
  6. notify

I have created a little more complex script that lets you pass the token file and the directory to be backed up as arguments when running the python script.

if (len(sys.argv)>2):    try:        token=sys.argv[2]        service=createDriveService(token)        if(len(sys.argv)==4 and sys.argv[1]=="backup"):            token=sys.argv[2]            service=createDriveService(token)            fileName=archive(Path(sys.argv[3]))            clearPastBackups(service)            upload(fileName,service)            clean(fileName)            date=datetime.datetime.now().strftime("%d %B, %Y (%A) at %I:%M %p")            notify(True,fileName,date)        elif(sys.argv[1]=="clean"):            printFiles(service)            removeAll(service)        else:            print("Argument format incorrect. The only arguments that can be used are 'backup' and 'clean'. Pass the token file as the second argument. If you choose backup, please specify the directory to back up as the third argument.")    except Exception as e:        date=datetime.datetime.now().strftime("%d %B, %Y (%A) at %I:%M %p")        print(e)        notify(False,e,date)else:    print("Error: The arguments passed are not enough. You can choose either 'clean' or 'backup'. Pass the token file as the second argument. If you choose backup, specify the directory to backup as the third argument.")

You can obtain my complete script here. My script also includes an additional Clean method that removes all the backup files from your Google Drive just in case you need it.

Once done, you can create a cron job to run the script at specified intervals. I created a bash script file that executed the following command:

sudo python backup.py backup <token> <dir>

Then, I opened the crontab file using the crontab -e command and appended this file to it to run on the first of every month.

Now, my blogs will be backed up every month and uploaded to Google Drive and at the completion of the process, I will receive a notification to my mobile phone.

--

--

Theviyanthan Krishnamohan

I am a software engineer who is passionate about frontend development, UX design, machine learning, neural networks, blockchain, robotics and IoT.