DER Data & Analytics Blog

Work Smart, Not Hard: From raw data to finished presentation - fully automated!

Written by Admin | Dec 15, 2021 11:00:00 AM

Automation takes work off our hands and makes our lives easier. Instead of executing individual processes manually and irregularly, an automated pipeline can be created that includes all the necessary work steps and executes them independently according to a set schedule. We have successfully completed such a project, which can serve as a prototype for other projects, and would like to give a rough description of it in this article.

Initial situation

The project involved analysing a Europe-wide survey that is conducted at regular intervals. As a starting point, we had raw data from two different market research companies. Previously, this data was evaluated manually after each round of questions and processed into graphics and analyses. The aim of our project was to automatically consolidate the respective survey data, prepare it quantitatively and visually and thus minimise the recurring effort as much as possible.

Our task was to automatically produce the following documents for each country in the survey:

  • A Google Sheets Slidedeck with a graphic presentation of the individual questions, a table of contents and content slides formatted and designed according to a template
  • A Google Sheets document that shows the individual questions again in detail and with various aggregations and groupings

The project also included the following services:

  • A consolidated Europe slide deck with comparisons between the individual countries
  • A Google Data Studio dashboard for simple, personalised analysis of all data

Procedure

We proceeded as follows to realise the various documents and preparations:

Data

In order to create a comprehensive data basis, we used a Python script to import the historical monthly data from both sources and merged them into a csv file. As the data was collected by two different market research companies, it had to be prepared once in order to be compatible in terms of formatting. From now on, the monthly data can be added to the data basis without further effort and further processed in a bundled form.

Python (symbolic image) / pixabay

As we were only provided with the raw data, i.e. the actual survey data, we used another Python script to weight the responses based on age, gender and region of origin in the participants' country of origin in order to obtain the desired distribution per country.

Based on this weighted data, we created various analyses. Firstly, a separate presentation with the most important facts and insights was created for each country, which provides the respective market managers with important findings. In addition, all available survey data (questions, answers, countries, years) were to be made accessible in an interactive Data Studio dashboard to give interested parties the opportunity to take a deep dive into the data. Finally, the data should be presented in tabular form for further analysis.

As Google Workspace offers well-developed interfaces (APIs) for developers, we decided to use this environment for the presentation of our analyses. All APIs and calculations were implemented in Python.

Google Slides - Slide Decks

The GSlides API makes it possible to create a presentation completely "from scratch" programmatically. We have taken advantage of this. As the survey is conducted in 24 countries and the evaluation is to be implemented analogue in all countries, we used a script to build a framework that is fed with the data of the respective country.

This enabled us to create 24 analogue presentations with the effort of just one. All images, texts and graphics were also added to the presentation via API so that no element had to be inserted manually.

In order to adapt the headlines of the slides to the statement of the analysed data, we created a Google Sheet in which every authorised user can update and adapt the texts. These are automatically integrated into the slides via placeholders when the presentations are created. This means that the headings can be included without having to adjust the slides manually.

Example of a slide generated via the Google Slides API (original in colour)

Thanks to the consistent data situation, a new data point can be displayed dynamically in the next month without changing the pipeline in the presentation. All you have to do is re-run the script that creates the visualisations in the presentation based on the data.

Google Data Studio - Interactive dashboard

We created the interactive dashboard with Google Data Studio . To do this, the data is loaded into the Google Big Query data warehouse, which has a direct connector to Google Data Studio. All charts can now be filtered according to age, gender, country and other factors in order to answer individual questions and analyse the survey data as required.

Examples from the Data Studio Dashboard (original in colour)

Google Sheets - Detailed quantitative data

In order to not only visualise the survey data in an appealing way at the end, but also to enable further calculations, the data was also processed in a Gsheet. Google's Sheets API was used for this, which was also implemented in Python. This reads the survey data from the csv file and writes it to a Gsheet in a structured format.

Example of a data table

Result

After successful implementation, the solution consists of the following process, which generates the three deliverables slide decks, dashboard and data sheet from the raw data.

Data flow of the implementation

The presentations, dashboard and table analyses can be shared via a link or email address, making it easy to disseminate the information.

Now a complete evaluation and visualisation can be carried out "at the touch of a button" every month after data collection and distributed to all employees promptly without a great deal of manual effort.

Do you have a similar project or need? Please contact us for further information or professional support!