Learning Analytics 101

Emerging from research into the visualisation of argument construction, the analysis of learner interactions within networks has become widely recognised in recent years as a rich and effective means of providing feedback on learner progress (Najjar, Duval and Wolpers, 2006) – facilitating personalised learning (Beck and Woolf, 2000), developing collective intelligence (De Liddo, et al., 2012) automating metadata annotation (Downes, 2004), and offering opportunities for enhanced discoverability (Siemens, 2012).

Learning Analytics (LA) is a relatively new area of research that is comparable with other fields, such as Big Data, e-science, Web analytics, linguistic analysis and Educational Data Mining (EDM). All of these fields use large collections of in-depth data to identify patterns. While EDM and LA have many similarities, EDM tends to focus on analysing metrics with the aim of building prediction models (e.g. Kizilcec, Piech and Schneider, 2013; Wen, Yang and Rosé, 2014), while LA inclines to data analysis for developing learning processes. Both applications of technology have the potential to disrupt and have critical implications for future teaching and learning practice, with far reaching, but little understood outcomes.

The underlying assumptions of LA are based on the belief that Web-based proxies for behaviour can be used as evidence of knowledge, competence and learning. Through the collection and analysis of “trace data‟ (e.g. learners‟ search profiles, their website selections, and how they construct, use and move information on the Web – Stadtler and Bromme, 2007; Greene, Muis and Pieschl, et al., 2010) learning analysts explore “how students interact with information, make sense of it in their context and co-construct meaning in shared contexts” (Knight, Buckingham Shum and Littleton, 2014:10). LA methods that focus on discussion forums include processes that identify learners’ attention, sentiment analysis (agreement or disagreement), learner activity, and relationships between learners within forums (De Liddo, et al., 2011).

Design of LA instruments is not neutral, but inevitably reflects the ideology, epistemology and pedagogical assumptions of the designers. Data are not value free; they require interpretation and are subject to “interpretative flexibility” as much as any other technological development (Collins, 1983; Hamilton and Feenberg, 2005). Historically, information and communication technology (ICT) interventions in education have been based on objectivist assumptions that learners’ ability to represent or mirror reality are key to judging evidence of knowing and learning. While still maintaining a strong position in summative assessment, over the past thirty years the assumptions underlying objectivism have been challenged by a growing body of constructivist thought which holds that the key to understanding how knowledge is built is through examining the interpretive process of learning (Jonassen, 1991). The practice of Learning Analytics broadly adheres to either an objectivist perspective, which prioritises the use of trace data to make evaluations of knowledge acquisition (assessment of learning), or a constructivist position which values the provision of feedback to facilitate improved learner self-awareness (assessment for learning).


Learning Analytics provides some evidence that awareness of peer feedback improves collaboration (Phielix, et al., 2011) and a key method for providing feedback is through drawing attention to useful interaction metrics through visualisation techniques. Duval (2011) asserts that data visualisation “dashboards‟ can provide useful feedback mechanisms for learners and educators which can aid their evaluation of learning resources, and which may lead to improved discovery of content that is better suited to their needs.

For example Murray et al. (2013) describe a prototype dashboard which aims to support learners’ online deliberations through the use of textual analysis to identify and monitor: reflection, questioning, conceptualising, peer interaction as well as other social awareness metrics. Equipped with such a dashboard, facilitators may monitor common online forum problems like off-topic conversation, conversation dominated by specific contributors and high emotional content.

Duval (2011) asserts that “one of the big problems around learning analytics is the lack of clarity about what exactly should be measured” (2011:15) and suggests that “typical measurements…of time spent, number of logins, number of mouse clicks, number of accessed resources…” (2011:15) are not adequate metrics for finding out how learning being accomplished. Visualising other data sources including “emotion and stress analytics” (Verbert, et al., 2014:1512) may be relevant to enhance reflection and monitoring.

Ethical Issues

Learning analytics involves common data mining techniques and as such may have potential problems with ethical values like privacy and individuality. Data mining makes it difficult for an individual to control how their information is presented or distributed. Van Wel and Royakkers (2004) have identified two main forms of data mining: “content and structure mining‟ and “usage mining‟:

“Content and structure mining is a cause for concern when data published on the Web in a certain context is mined and combined with other data for use in a totally different context. Web usage mining raises privacy concerns when Web users are traced, and their actions are analysed without their knowledge.” (van Wel and Royakkers, 2004:129).


Critics have focused on a number of problems with the outcomes of analysing learning. The reliable validation of human and automatic annotation is problematic and unresolved (Rourke, et al., 2003; de Wever, et al., 2006); crude feedback mechanisms can lead to efforts to “game the system‟, so that educators design learning objects to elicit positive responses regardless of the overall benefit to learners; analytics can lead to learner-dependence on feedback rather than their own understanding, and the ethical implications of combining and representing data are not fully comprehended (Shum and Ferguson, 2012).


  • Beck, J. E., and Woolf, B. P. (2000). High-level Student Modeling with Machine Learning. In G. Gauthier, C. Frasson and K. VanLehn (eds.), Proceedings of 5th International Intelligent Tutoring Systems Conference, ITS 2000, 584-593. June 19-23, 2000, Montréal, Canada.
  • Collins, H. M. (1983). An Empirical Relativist Programme in the Sociology of Scientific Knowledge. In K. Knorr-Cetina and M. Mulkay (eds.), Science Observed. Perspectives on the Social Study of Science, 85-113. London: Sage Publications.
  • De Liddo, A., Buckingham-Shum, S., Quinto, I., Bachler, M., and Cannavacciuolo, L. (2011). Discourse-centric Learning Analytics. In Proceedings of the 1st International Conference on Learning Analytics and Knowledge, 23–33. February 27 – March 1, 2011, Banff, Alberta.
  • De Liddo, A., Buckingham Shum, S., Convertino, G., Sándor, Á. and Klein, M. (2012). Collective intelligence as community discourse and action. In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work Companion (CSCW ’12). ACM, New York, NY, USA, 5-6.
  • de Wever, B., Schellens, T., Vallcke, M. and van Keer, H. (2006). Content Analysis Schemes to Analyze Transcripts of Online Asynchronous Discussion Groups: A Review. In Computers and Education, 46(1), 6-28.
  • Downes, S. (2004). Resource Profiles. In Journal of Interactive Media in Education, 5, 1–32. Special Issue on the Educational Semantic Web. [Online] Available at: http://www-jime.open.ac.uk/2004/5 [Accessed on 5 September 2014].
  • Duval, E. (2011). Attention please! Learning Analytics for Visualization and Recommendation. In Proceedings of LAK11: 1st International Conference on Learning Analytics and Knowledge, 9–17. February 27-March 1, 2011, Banff, Alberta.
  • Greene, J. A., Muis, K. R., and Pieschl, S. (2010). The Role of Epistemic Beliefs in Students‟ Self-Regulated Learning with Computer-Based Learning Environments: Conceptual and Methodological Issues. In Educational Psychologist, 45(4), 245–257.
  • Hamilton, E., and Feenberg, A. (2005). The Technical Codes of Online Education. In Techné: Research in Philosophy and Technology, 9(1).
  • Jonassen, D. H. (1991). Objectivism Versus Constructivism: Do We Need a New Philosophical Paradigm? In Educational Technology Research and Development, 39(3), 5-14.
  • Kizilcec, R. F., Piech, C., and Schneider, E. (2013). Deconstructing Disengagement: Analyzing Learner Subpopulations in Massive Open Online Courses. In Proceedings of the Third International Conference on Learning Analytics and Knowledge, 170-179. ACM. April 08 – 12, 2013, Leuven, Belgium.
  • Knight, S., Buckingham Shum, S. and Littleton, K. (2014). Epistemology, Assessment, Pedagogy: Where Learning Meets Analytics in the Middle Space. In Journal of Learning Analytics, 1, 2, 23 – 47.
  • Murray, T., Wing, L., Woolf, B., Wise, A., Wu, S., Clark, L. and Xu, X. (2013). A Prototype Facilitators Dashboard: Assessing and Visualizing Dialogue Quality in Online Deliberations for Education and Work. In Proceedings of International Conference on e-Learning, e-Business, Enterprise Information Systems, and e-Government (EEE’13), 34-40. July 22-25, 2013 Las Vegas Nevada.
  • Najjar, J., Duval, E., and Wolpers, M. (2006). Attention Metadata: Collection and Management. In Proceedings of WWW2006 Workshop on Logging Traces of Web Activity: The Mechanics of Data Collection, 1-4. 23-26 May, 2006, Edinburgh.
  • Phielix, C., Prins, F. J., Kirschner, P. A., Erkens, G., and Jaspers, J. (2011). Group awareness of social and cognitive performance in a CSCL environment: Effects of a peer feedback and reflection tool. In Computers in Human Behavior, 27(3), 1087-1102.
  • Shum, S. B., and Ferguson, R. (2012). Social Learning Analytics. In Educational Technology and Society, 15(3), 3-26.
  • Siemens, G. (2012). Learning analytics: envisioning a research discipline and a domain of practice. In Proceedings of the 2nd International Conference on Learning Analytics and Knowledge, 4-8. ACM. April 29 – May 02, 2012, Vancouver, BC, Canada.
  • Stadtler, M., and Bromme, R. (2007). Dealing with Multiple Documents on the WWW: The Role of Metacognition in the Formation of Documents Models. In International Journal of Computer-Supported Collaborative Learning, 2(2), 191–210.
  • van Wel, L., and Royakkers, L. (2004). Ethical Issues in Web Data Mining. In Ethics and Information Technology, 6(2), 129-140.
  • Verbert, K., Govaerts, S., Duval, E., Santos, J. L., Van Assche, F., Parra, G., and Klerkx, J. (2014). Learning dashboards: an overview and future research opportunities. In Personal and Ubiquitous Computing, 18, 6 1499-1544.
  • Wen, M., Yang, D., and Rosé, C. P. (2014). Sentiment Analysis in MOOC Discussion Forums: What does it tell us? In Proceedings of the 7th International Conference on Educational Data Mining (EDM 2014), 130-137. July 4 – 7, 2014, London, UK.

DAL MOOC – Week 1 Beginning

I’ve just started the edX Data, Analytics, and Learning MOOC (rather late – but I’m catching up) which has involved getting the hang of ‘Hangouts’. These are informal chats between experts which can take a while to get going, but inside these video conversations there are nuggets of extremely useful stuff. So in the spirit of the ‘revise/remix’ ethos of the course I’ve started to edit them into ‘bitesize’ chunks (using the free version of Lightworks). The first two are from week 1 where George Siemens (Athabasca University), Carolyn Rosé (Carnegie Mellon University), Dragan Gašević (Athabasca University) and Ryan Baker (Columbia University) give their definitions of Learning Analytics and answer the question “what do you do when you do Learning Analytics?”.

Personally, I can’t wait to get to Dr Rose’s section as her work on Discourse Analysis sounds right up my street, but I’m also very interested in Social Network Analysis  which will be covered by Dr Gasevic, and Prediction Modeling which will be led by Dr Baker.

There are a range of open source tools used on this course:
Lightside for Discourse Analysis
Gephi for Social Network Analysis, and
Rapid Miner for Prediction Modeling

The course also encourages learners to use the social networking aggregation and learning support tool, ProSolo, provides an introduction Tableaux – “a good tool to get started with” data analysis and visualisation – and shares a load of cleaned and anonymised datasets for us to play with.

Early days

Beetham031114Poster Session at ILiAD Conference 2014/Sue White © 2014/CC BY 2.0

A recent highlight was the ILIaD Conference, held at the University of Southampton, where I had the opportunity to discuss my summer research project with Digital Literacies expert, author of Rethinking Learning for the Digital Age, and keynote speaker, Helen Beetham during the poster session.

These early days in my life as a PhD research student seem to have been mainly taken up with training. I undertook an excellent Demonstrator training session at the end of September, which has led to some paid work evaluating first year student presentations throughout November. I have also undertaken the very useful Public Engagement and Presenting Your Research training as well as the Successful Business Engagement programme, Finding Information to Support Your Research and the Lifelong Learning PGR Training Package.

Public Engagement/Tim O'Riordan ©2014/CC BY 2.0

Public Engagement/Tim O’Riordan ©2014/CC-BY 2.0

Induction into the Web and Internet Science research group provided an opportunity to find out what my lab colleagues are mainly up to and to do some lightweight research into Provenance (see: WAIS Provenance Project).

I attended a seminar on the Portus MOOC (the focus of my summer dissertation project) and met with MOOC team leader and dedicated online learning exponent, Graeme Earl. It was great to hear Graeme talking about how he approached designing the course and using some of the key terms from the model I used for coding attention to learning in my project. He also raised issues about managing high numbers of learners comments. How do we scale MOOCs, engage larger numbers of learners and still pay attention to scaffolding? My hope is that my research will provide some support in this area.

I joined the PEGaSUs e-learning research group at the end of October. This is a interdisciplinary group of research students from diverse backgrounds who have a shared interest in online learning. Topics for discussion included teaching in emerging economies, blended learning, measuring teacher performance, communities of practice, e-portfolios, and (of course) Massive Open Online Courses. Post-meeting I set up a (closed) Facebook group to help keep the discussion going between meetings. I think PEGaSUs stands for Postgraduate E-learning Group at Southampton University (looks about right).

Finally, The British Computing Society Introduction to Badges for Learning event run by Sam Taylor and Julian Prior at Southampton Solent University at the end of October provided an excellent overview of this emerging method for enhancing learner engagement.

My growing to-do list is mainly taken up with the tasks involved in reviewing, editing and generally preparing my summer project for publication. This involves:

  • Finding suitable journals/conferences that may be interested in publishing my work.
  • Looking for what’s missing in my dissertation.
  • Interrogating my analysis (Is it watertight? Does it hold up under scrutiny?).
  • Reading more papers on learning analytics and content analysis.
  • Apply my analysis to other data sets (possibly the Web Science MOOC comment data).
  • Getting a paper ready for submission by the end of February 2015.

MSc Summer Project Poster

CanYouTell_PotraitA1FINALMy MSc Dissertation Poster/Tim O’Riordan ©2014/Creative Commons by-nc-nd License.

In my last post (several months ago) I set out a loose plan for my summer dissertation project. Now the dissertation has been submitted and I’ve produced this poster to sum up it’s 14,000 words – or at least its key points. It had it’s first outings at the Barry Wellman Distinguished Lecture event last week, and at the Institute for Learning, Innovation and Development (ILiAD) Conference, and a Public Engagement and Outreach Workshop earlier this week – all of which were held at the University of Southampton.

What are my key findings?

Essentially, after analysing comments associated with learning objects on a Massive Open Online Course (MOOC) in an attempt to identify ‘attention to learning’, I discovered that the coding model I applied to comments was good at identifying where there was not much learning activity going on, and slightly less impressive at identifying attention to learning. While there is a lot of work to do on this area, results so far have been encouraging, and I’m looking at how the usefulness (or otherwise) of my approach can be tested further.

Why is this important?

Making education and training more widely available is vital for human development and the Web has a significant part to play in delivering these opportunities. Running a successful online learning programme (e.g. a MOOC) should involve managing a great deal of learner interaction – answering questions, making suggestions, and generally guiding learners along their paths. But coping effectively with high levels of engagement is time intensive and involves the attention of highly qualified (and expensive) teachers and educational technologists. My hope is that through my research an automated means of showing how well and to what extend learners are attending to learning can be developed that will make a useful contribution to managing online teaching and learning.

Exams, Dissertation Topic and Authorship Issues

University of Southampton libraryUniversity of Southampton library/Jessie Hey © 2006/CC BY 2.0

Phew! Examinations and essay writing are now over for my MSc in Web Science course and I’m now preparing to start my summer dissertation project. Two supervisors have agreed to oversee the progress of my work, the main research question of which is: can comments related to Web-based learning objects be used to good effect in the evaluation of these objects?

This is essentially a learning analytics research project and my initial plan is to collect a suitably large dataset from FutureLearn MOOC’s and analyse this data using social network analysis and sentiment analysis tools. I aim to examine the learning objects and the topics discussed, the language used and what the sentiment polarity is towards these topics and objects. The objective is to identify characteristics of distinct participant roles, look for similarities in language and evaluate these in terms of established pedagogical frameworks (e.g. Dial-e, DialogPLUS) to support relevant schema.org descriptions (e.g. educationalFramework).

I’m in the process of working out exactly what I’m going to do and how I’ll do it. Because it involves ‘scraping’ web sites for people’s opinions, there are ethical issues as well as theorectical and practical hurdles to be overcome. However, I believe that this type of data can be profitably used to assist with learning object evaluation and think it will be useful to find out if there’s any evidence to back this opinion up.

The ‘authorship issues’ referred to in the title of this post aren’t strictly related to my studies as they have arisen from my previous employment as an Advisor at Jisc Digital Media. However, having just discovered that articles I wrote for my old employer are now being attributed to another person, I have been pondering the ‘web sciencey’ issues raised by this – like trust, provenance, the use of metadata and the nature of authorship in the digital age. By way of an example of what’s happened, here’s a link to an article I wrote about Khan Academy videos in March 2011 as preserved on the Internet Archive’s Wayback Machine (correctly attributed) – and here it is currently published on the Jisc Digital Media Blog (incorrectly attributed).

Joe Chernov discusses the issue of byline re-attribution in his post: Creators vs Corporations: Who Owns Company Content? All very interesting, and a topic that I will return to in a future post.