Published on

From Microsoft Form to Google Sheet Made Easy

Authors

Intro

In 2023, my organization received a Microsoft grant for non-profits, which provided benefits such as Microsoft 365, Azure, and several other Microsoft products. (Maybe I'll write an article about the benefits later.)

But, since 2020 we have been using Google Workspace for our operations. We have evaluated the pros and cons from the perspective of collaboration after receiving the grant. After assessing our minimal operational workflow and tools, we decided to use Google Workspace for daily operations and Microsoft products for data-related tasks, including data processing.

Use Case and Problem

The problem arises when we need to use Microsoft 365 to gather data using Microsoft Forms (because the feature is only available there) and the responses need to be analyzed in real-time but also easily accessible by the operational team. If real-time analysis wasn’t required, using the default Microsoft Forms features would suffice. We could create an Excel template and process the data manually before analysis, which is similar to batch processing but done manually. However, as mentioned, the operational team needs to access the analyzed data from anywhere at any time. Therefore, I needed to ensure the data processing and analytics were updated at least hourly.

In Microsoft 365, we can use Power Automate to gather data from Microsoft Forms automatically. However, we would have to pay for this service, which is not included in the Microsoft non-profit grant. So, I decided to use an API to fetch the data easily. Unfortunately, Microsoft's API documentation is quite lacking. After spending a long time trying to learn about the Microsoft Forms API, I got inspiration from manually downloading the responses.

Solution

1. Get Response File Direct Download URL

When I first downloaded the response file, I noticed that the source of the downloaded file was a blob URL. It was tricky because it had a different unique ID each time.

Response file blob URL

So, I found another way to get the response file link by inspecting network activity using dev tools. Success!

Response file blob URL

The link follows a format like this:

https://forms.office.com/formapi/DownloadExcelFile.ashx?formid=[XXxxxFORM_IDxxxXX]&timezoneOffset=-420&__TimezoneId=Asia/Jakarta&minResponseId=1&maxResponseId=1116

You can see in the picture above that the maxResponseId is highlighted in red. It should be removed to ensure all response data is downloaded. If it's not removed, only the number of responses up to the written maximum will be shown.

Make sure you already have an application registered in Microsoft Azure. If you don’t, I won’t explain how to set it up here. Assuming you already have an app, here's what you need to do to grant access to Microsoft Forms API:

  1. Go to Microsoft Entra ID > App Registration > (Choose Your App) > API Permission > Add a Permission > APIs My Organization > Type: Microsoft Forms.
  2. Select Delegated Permission and check all permissions. Done!

3. Enable Sheets API Access

To access spreadsheets via the Google Sheets API, you need to authenticate and authorize your application. You can read more about this in this link.

4. Automating with Python

After getting the response file download URL and granting Microsoft Forms API Permission, we can use Python to retrieve and export the data to a Google Spreadsheet. Here are the requirements:

  • requests
  • gspread
  • pandas

First, import the modules and request an access token using the requests module.

import requests
import pandas as pd
import gspread

login_url = 'https://login.microsoftonline.com/TENANT_ID/oauth2/token'

request_body = {
    'grant_type': 'password',
    'username': 'YOUR_AUTHORIZED_EMAIL',
    'password': 'YOUR_PASSWORD',
    'client_id': 'CLIENT_APP_ID',
    'client_secret': 'CLIENT_APP_SECRET',
    'resource': 'https://forms.office.com',
}

response = requests.post(url=login_url, data=request_body, headers={'Content-Type': 'application/x-www-form-urlencoded'})
access_token = response.json()['access_token']
print(access_token)

You may notice: "Why are you using username and password method in this code? It’s not secure."

Yeah, I know. I’ll explain that later :)

Back to the topic. After we get the access_token, we can retrieve the response data from Microsoft Forms and convert it to a DataFrame using pandas.

forms_url = 'https://forms.office.com/formapi/DownloadExcelFile.ashx?formid=[XXxxxFORM_IDxxxXX]&timezoneOffset=-420&__TimezoneId=Asia/Jakarta&minResponseId=1'
form_request = requests.get(url=forms_url, headers={'Authorization': f'Bearer {access_token}'}, allow_redirects=True)
print(form_request.headers.get('content-type'))
# open('response.xlsx', 'wb').write(form_request.content)  # If you want to save it as response.xlsx
df = pd.read_excel(form_request.content)
df.head(5)  # Preview the data

Since the Sheets API doesn’t handle Python datetime objects, you'll need to convert them to strings. I also fill missing values (NA) with null. You can do further data cleansing at this step.

df['Start time'] = df['Start time'].astype(str)
df['Completion time'] = df['Completion time'].astype(str)
df = df.fillna('')

After that, we use gspread to export the DataFrame to Google Sheets. You only need the spreadsheet ID, which looks like this: https://docs.google.com/spreadsheets/d/SPREADSHEET_ID

credentials = {
  "type": "service_account",
  "project_id": "YOUR_PROJECT_ID",
  "private_key_id": "PRIVATE_KEY_ID",
  "private_key": "PRIVATE_KEY",
  "client_email": "CLIENT_EMAIL",
  "client_id": "CLIENT_ID",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://oauth2.googleapis.com/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "YOUR_CLIENT_X509_CERT_URL",
  "universe_domain": "googleapis.com"
}

gc = gspread.service_account_from_dict(credentials)

# Automate Data Push
sh = gc.open_by_key('TARGET_SPREADSHEET_ID')
ws = sh.worksheet('NAME_OF_SHEET_YOU_WANT_TO_FILL')
ws.update([df.columns.values.tolist()] + df.values.tolist())

That's it! You can check the targeted spreadsheet, which will be filled with Microsoft Forms responses. You can schedule this script to run using cron jobs or Airflow. That’s why I’m using the username and password in this case. Afterward, you can perform any quick "real-time" analysis to share with your co-workers.