Data is an important part of the Sunbird platform.

There are two types of data generated on Sunbird, telemetry and platform metadata. Telemetry data is generated by user actions on the platform and platform metadata is associated with platform assets.

Telemetry data consists of a set of telemetry events. Each event is an atomic unit of data which contains an individual user action along with the context in which the action happened. Telemetry events are stored as text files which can be ingested by downstream programs but are also human readable

Platform metadata is also indirectly generated through user actions but the metadata is associated with objects and assets in the platform. For instance content, textbooks, courses etc. on the platform all have associated metadata such as date of creation, creator ID etc.



Every user action such as CLICK, VIEW (page), DRAG, DROP, OPEN (content), CLOSE(content), SCAN (QR code) etc  generates telemetry events which are stored in the platform. The attributes embedded in the event capture the context in which the user action happened. For example a CLICK action would have attributes specifying whether the click happened when the user used the app or on the portal, whether it happened  during content usage or outside the context of content usage. While the data captured is very detailed, the events are associated only with platform-generated anonymised user IDs to preserve the privacy of the user..

A telemetry specification is provided that specifies the structure of  all the telemetry events. The attributes possible for each event are part of the specification. As described in the example above related to the CLICK event, the values of the attributes will vary depending on the context.

There are many types of analysis possible by using telemetry data. Some examples are understanding of common user journeys, user engagement levels on the platform, daily weekly and seasonal patterns of usage, funnel analysis of app and portal usage, and measurement of effectiveness of platform assets.

Telemetry is available in two forms, one through an API which provides raw telemetry data and another through an API that allows querying of a Druid (open source) database that allows filtering and aggregation of the data through queries. Platform metadata can be accessed through different APIs depending on the metadata needed..

Data analysis for Sunbird can be at different levels based on the needs.

  1. Sunbird provides a bundled Superset (open source) visualisation tool. The tool has access to telemetry available in the Druid database. Many common questions can be answered and metrics generated using Superset.
  2. Software can be written to retrieve raw telemetry data from the telemetry API or retrieve summarised data by querying Druid. Additionally platform metadata can be retrieved using other APIs. There can be combined and additionally processed to generate more complex metrics and analysis. This may be represented as visualisations using stand-alone tools to provide further insights.
  3. The most complex possible analysis would combine telemetry and platform metadata with data from other external organisational databases to provide comprehensive multi level organisational dashboards which would include Sunbird usage but would extend into other organisational metrics.

Skills needed for analysis form a continuum from visualisation using Superset to complete development of large complex organisational level data solutions depending on the level of analysis needed as described above. One common need for analysts doing any level of data analysis is a deep understanding of telemetry attributes and their relationship with user actions and user activity flows. This is very important since there are over one hundred telemetry attributes that can be combined in different ways to generate insights.

Beyond data analysis is the ability for Sunbird users to configure elements of the platform related to data. Additional configuration increases the initial setup time but provides an opportunity to make the data generated more contextual to the use case thereby easing downstream analysis. Configuration is an engineering activity during initial setup and not part of an ongoing data analysis activity.