Analytics -- lighter, faster, cheaper

Here are some things we want from the Open edX analytics platform, gleaned from some of our clients’ use cases and requests.

But they’re all pretty substantial changes, and so we’d need to get several supporters together to make them happen.

What do you think? Does anyone else want these features?

Real time (or near real time) updates

The analytics pipeline is tuned for massive daily data updates against log data rotated in batch. Some simple data processing tasks that could be in real-time or take seconds/minutes on a small instance end up taking 30-90 minutes to update and rack up big AWS bills, because of all the overhead added by Hadoop map/reduce.

Though we can update the pipeline tasks to allow for more frequent data partitions, this still relies on rotated data logs, and a heavyweight process for crunching the numbers.

Can we use a stream-based processing approach instead, like what is described here?

Lightweight deployment

Suitable for small instances with hundreds or thousands of users.

Currently, the best way to support small deployments is a single-instance ubuntu server, but it is still quite complex to set up.

Flexible reporting

Because the analytics pipeline must scale to millions of users, it is limited to the minimum useful datasets that can be scalably reported. But smaller instances and blended learning scenarios may want to see learner-specific activity and data which simply could not be displayed for a large-scale course.

Can we have dynamically-generated reports for all sorts of datasets?

Simpler contributions

A dedicated, small analytics reporting application would be much easier for smaller organisations to contribute to, and to customise for their own individual organisational needs.

Can we enhance the Analytics API offerings to create custom, community-used analytics applications, tailored to specific use cases?

Jill, I’m so glad you posted this. We’ve been working on a project that we’re releasing to the whole community that lines up well with what you’re describing here. (Not surprising, since I suspect a lot of us in the community are feeling the same pain points around analytics.)

Late this past fall, a few of our customers sponsored us to build a reporting tool that would fill in a lot of the “missing pieces” of Open edX – the site-wide and cross-course reports and analytics that most other LMSes offer, and people (rightfully) expect to be there. While it’s intended as a complement to Insights (that is, we’re not out to replace Insights), we wanted to provide this reporting data without requiring Insights. We spent the late fall and winter building out a first version, and while we’re still in development, we are going to be releasing this to the whole community.

Below, I’ve included the Goals, non-goals, and the “Jobs to be Done” based on our discussions with our customers. I’ll share more info shortly (architecture, some Invision designs from our UX team, and our github repo), but wanted to respond to your message to let you know we agree, we’ve been taking an initial stab at solving the problem, and we’re excited to get others in the community involved!



  • Meet the needs of customers to fully understand what is happening on their Open edX site.
  • Identify gaps in the current feature set and determine what we need to build.
  • Provide access to as many reports as possible without the need for Open edX Insights, due to the complexity and cost of running the Insights pipelines. These solutions will not preclude an Insights installation, but our goal is to provide a lightweight option for users who want relatively straightforward reports.
  • Open-source the new reporting features–the data aggregation, APIs, and front-end–so that anyone in the Open edX community can use them and contribute to the effort.

Don’t reinvent the wheel.

  • We are not building a fully-featured business intelligence (BI) tool. If users want to pull reporting data out into such a tool, they are welcome to via the APIs or a data dump, but our intention is to provide simple built-in reporting capabilities.
  • We are not building extensions to Open edX that make Open edX the system of record for non-LMS data. That is, while some extensions are possible and will be taken into account for reports (e.g. custom fields on a user profile), Open edX is not intended to be a general-purpose datastore, keeping information such as organizational hierarchy, actions done by users in other systems, etc. If users want to combine data, the APIs will let them pull Open edX data out into another system for processing, but we are not adding extensions and APIs that allow for expansion of the core data models.
  • We are not replacing Google Analytics, HotJar, and other click-analysis and user activity tools. These tools do their job well and are easily integrated into Open edX.

Jobs To Be Done (JTBD)
These are written from the viewpoint of the jobs our customers are trying to do, not dictating a solution or getting into implementation details. (Read more about the JTBD framework here and here.)

Head of program

  • I need to demonstrate return on investment (ROI) / return on learning (ROL).
  • I need data to help me plan resource allocation (human and capital) for ongoing courseware activities, marketing and outreach efforts, and increased investment in our learning platform.
  • I need data to prove to internal stakeholders (executive team, board) that we are achieving a positive return on investment over time.
  • I need data to prove to our external stakeholders (our customers, our funders) that they are receiving a positive return on their investment in learning.

Site administrator

  • I need to understand how a particular learner is doing across all of their courses.
  • I need to inform my team how key metrics are changing over time, for example enrollments and completions.
  • I need to know how many monthly active users we have on our site.

Instructor/course author

  • I need to know where students are getting stuck in my course, so I can communicate with them and then improve the course.

That is fantastic news, @abeals! Very much looking forward to seeing what you’ve built!

Especially the Instructor/course author space – it seems like edX are trying to phase out the Instructor Dashboard, or at least limit expansion in that area of the LMS. A purpose-built reporting tool that can help people managing their running courses would be a godsend.

Thank you for sharing!

@abeals I’m very excited to hear about your analytics project!

Could you please also check out our thoughts/plans around an XBlock Reporting Tool and let me know if you think that would overlap with your project, or be complementary? Essentially I’m wondering if your project scope includes reporting on specific individual user answers/submissions (what answer did each student give to fill-in-the-blank question 5 in course X) or not.

@abeals Thank you for taking this initiative. I can see the need for a lightweight analytics tool in the Open edX Community.

I also appreciate your intention to minimize changes to edx-platform by creating a pluggable Django app for this feature. That mindset aligns well with our desire towards keeping only core concepts and features within the edx platform, while modular enhancements are plugged in without modifying the platform. In this regard, I’d like to point out the following:

  • Please see our recently implemented framework for Django App Plugins. With this framework, edx-platform automatically recognizes and installs registered LMS/Studio Django apps and their settings, urls, and signal handlers. So no changes to lms.env.json would even be needed.

  • As you may already be aware, edX Web Fragments can be used to implement front-end plugins. Depending on how you plan to enhance the edx-platform UI, we may first need to implement an architectural runway to support FED plugins in that part of the UI (if it doesn’t already exist). For example, if a new analytics tool is being added to DashboardX, we’ll want to first make sure DashboardX supports FED plugins - so DashboardX can automatically render the new tool without actually knowing about the tool.

In addition to our pluggability initiative, edX is also recently looking at learning standards that we can support and integrate with. We are actively investigating possibilities of integrating with Learning Record Stores (LRSs) as standardized by xAPI. Although we are in very early stages of this discovery, we can possibly unify our efforts if you also consider LRS for your needs.

edX, Architect

1 Like

Hi Nimisha,

First, thanks for the additional information and links.

One of the challenges we’re working on as we develop our lightweight analytics project is that of automated testing and specifically dependencies on components within edx-platform. I’m very interested in learning what testing strategies the folks and our greater community have for testing reusable django apps/edx-platform plugins that have such dependencies and making the test environment tractable for the greater community, make test development easier and faster


The edx-enterprise and edx-completion Django applications are good examples of substantial Django apps designed to be installed into edx-platform. Here are some of the ways they deal with dependencies on code in edx-platform:

  • Wrap imports from edx-platform in try/except blocks, and assign them to None if unavailable so the importing module can still itself be imported cleanly.
  • In tests covering code that uses the stubbed out imports, mock the stubs to return what the test needs.
  • Add a few tests to edx-platform to verify that changes there don’t break the installed app. These can be made contingent on having the package installed, if it isn’t installed by default.

Moving forward, we’d like to extract more modules from edx-platform into separate packages so they can be used directly as dependencies; let us know if there are any specific modules that would be convenient to have broken out.

We also want to start using “contract tests”, where a package or service that integrates with one of the Open edX services can contribute tests that must be run by that service’s test suite to prevent accidental breakage of the “contract” for the integration points. For service-to-service integrations we’ll probably use something like pact-python, and for packages this could probably be done as an improvement to openedx.core.djangoapps.plugins.

Hi Braden,

I’m one of the developers of our lightweight analytics app. I’ll defer to @abeals to reply to you on specifics with XBlock reporting, but I can share what I’m working on toward an initial release.

I’m looking at the groundwork of querying/capturing and reporting some basic site-wide information across courses so that there will be a REST API data driven dashboard as a path in the LMS for “reporting one stop shopping” Initial goals are to provide basic reporting on learner demographics of what data are available in the LMS, course enrollments, and quick access to the reports now in the per-course instructor dashboard.

Are you up for getting together with a video chat to discuss?


Hi Jeremy,

Thanks for the info and tips! I’ll look into those packages

@johnbaldwin I’d love to get together for a chat. I’ll reach out to you on Slack to follow up.

Hi All,

I’m working on the architectural OEP for lightweight analytics and really interested in understanding better who all has custom data retrieval/reporting/analytics needs that would be served by a lightweight analytics framework and what those needs are. So please speak up!

Following on features mentioned by the folks who have posted already in this thread: What are your needs with regard to realtime or near-realtime data retrieval, report generation, analytics, data visualization, ease of use to customize for your own needs? deployment considerations?


@johnbaldwin We are currently in the exploratory phase of revising our testing strategy. I’ve asked Jeremy to respond to your question as he’s beginning to look into this. Let me know if you need any further direction on this. Thanks.

@johnbaldwin @abeals @Natea Thanks a lot for taking the time to discuss this project, and for working on it in the first place! It’s a great and really anticipated contribution, that has the potential to be very useful to a lot of people in the community.

Do you have a design document that you could share, so we could understand the details of what you are planning? It would help making sure that it could be useful to the larger community - not only to ensure that it can gather multiple users/contributors, but also to make sure that any changes you will need to do to the main code base of Open edX (like API changes?) can actually be merged upstream. It could be useful to do the usual product/architecture review, at least for these shared parts?

Hi @antoviaque,

Thanks for asking. I currently don’t have a ready design document yet for the lightweight analytics app we’re developing. I will make this a priority and post the doc or a link to it, hopefully next week


@johnbaldwin That’s great - thank you! : )

Hi Everyone,

Here is the link to the design document:

I’ve also opened up the document to commenting. So please comment away!

cc @antoviaque

@johnbaldwin That’s great, thank you for posting this!

@Braden @jill_opencraft Would you want to schedule a review of this on our end?

@johnbaldwin Thanks a lot for posting the design document! This looks like a very promising starting point, and I’m looking forward to seeing the release of the first version. I reviewed the document and added a few comments.

@smarnach, thanks for your review comments in the doc and appreciate the encouragement!

I’ll follow up with replies by early next week at the latest (We’re wrapping up a sprint this week).

Hi @jill_opencraft,

Looks like lightweight analytics is on the topic agenda for the Developer Summit on Jun 1:

Will you be there?