Here are some things we want from the Open edX analytics platform, gleaned from some of our clients’ use cases and requests.
But they’re all pretty substantial changes, and so we’d need to get several supporters together to make them happen.
What do you think? Does anyone else want these features?
Real time (or near real time) updates
The analytics pipeline is tuned for massive daily data updates against log data rotated in batch. Some simple data processing tasks that could be in real-time or take seconds/minutes on a small instance end up taking 30-90 minutes to update and rack up big AWS bills, because of all the overhead added by Hadoop map/reduce.
Though we can update the pipeline tasks to allow for more frequent data partitions, this still relies on rotated data logs, and a heavyweight process for crunching the numbers.
Can we use a stream-based processing approach instead, like what is described here?
https://aadrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html
Lightweight deployment
Suitable for small instances with hundreds or thousands of users.
Currently, the best way to support small deployments is a single-instance ubuntu server, but it is still quite complex to set up.
Flexible reporting
Because the analytics pipeline must scale to millions of users, it is limited to the minimum useful datasets that can be scalably reported. But smaller instances and blended learning scenarios may want to see learner-specific activity and data which simply could not be displayed for a large-scale course.
Can we have dynamically-generated reports for all sorts of datasets?
Simpler contributions
A dedicated, small analytics reporting application would be much easier for smaller organisations to contribute to, and to customise for their own individual organisational needs.
Can we enhance the Analytics API offerings to create custom, community-used analytics applications, tailored to specific use cases?