Evaluation and automatic analysis of MOOC forum comments

My doctoral thesis is available to download via the University of Southampton’s ePrints site.


Moderators of Massive Open Online Courses (MOOCs) undertake a dual role. Their work entails not just facilitating an effective learning environment, but also identifying excelling and struggling learners, and providing pedagogical encouragement and direction. Supporting learners is a critical part of moderators’ work, and identifying learners’ level of critical thinking is an important part of this process. As many thousands of learners may communicate 24 hours a day, 7 days a week using MOOC comment forums, providing support in this environment is a significant challenge for the small numbers of moderators typically engaged in this work. In order to address this challenge, I adopt established coding schemes used for pedagogical content analysis of online discussions to classifying comments, and report on several studies I have undertaken which seek to ascertain the reliability of these approaches, establishing associations with these methods and linguistic and other indicators of critical thinking. I develop a simple algorithmic method of classification based on automatically sorting comments according to their linguistic composition, and evaluate an interview-based case study, where this algorithm is applied to an on-going MOOC. The algorithm method achieved good reliability when applied to a prepared test data set, and when applied to unlabelled comments in a live MOOC and evaluated by MOOC moderators, it was considered to have provided useful, actionable feedback. This thesis provides contributions that help to understand the usefulness of automatic analysis of levels of critical thinking in MOOC comment forums, and as such has implications for future learning analytics research, and e-learning policy making.

The Cognitive Theories of Jean Piaget and L. S. Vygotsky


Jean Piaget and Lev Vygotsky

Jean Piaget and L. S. Vygotsky are key figures in the exploration and development of ideas regarding how humans construct knowledge out of experience. While both explored how a child’s interaction with their environment affected the development of cognition, Piagets’ theories contrast sharply with Vygotsky’s – with the former exploring internal processes within individual children, and the latter focusing on the role of social influences on cognition. However, since the mid-twentieth century their different roles in the formation of constructivist theories on the nature of knowledge have played a hugely influential role in national educational policies, curriculum design, and teacher training.

Although not specifically focused on teaching, Piaget’s research methods have influenced educational research, and his ideas on how children organise knowledge, develop in stages, and transition between stages as they adapt to new information, have had a lasting impact on pedagogy. He proposed that how we learn to think derives from our perception of what we do. His research as a developmental psychologist focused on how children’s thinking develop from physical interactions to become internalised logical concepts that can be applied to abstract cognition (Piaget, 1970).

Through observation and questioning of his own and friends children, Piaget produced a body of theory built on three main concepts: schema, adaption, and stages of development. With ‘schema’ Piaget describes the essential building blocks of intelligence which are laid down early in human development. These are created from essentially pre-verbal sensorimotor activity by which humans understand the world and are used to construct verbal thought and logic (Piaget, 1952).

According to Piaget, as children mature, these schemata change and integrate new knowledge through one of three processes of adaptation or ‘equilibration’: assimilation, accommodation and equilibrium. ‘Equilibrium’ refers to a state of satisfaction a child feels regarding their current understanding, ‘assimilation’ describes the process of adjusting an experience to fit a child’s existing schemata or understanding, and ‘accommodation’ is the manner in which a child may change existing schemata as a result of new knowledge. While undergoing assimilation or accommodation the child experiences cognitive conflict or ‘disequilibration’, which settles into a state of equilibrium once the more refined explanation is adopted.

Piaget mapped key steps in the maturation of intelligence and adaption of schemata throughout infancy, childhood and adolescence to four age-related steps (Hopkins, 2011; Capel, 2016; McLeod, 2018). The first, ‘sensorimotor’ stage occurs between birth and about 24 months. Infants physically interact with their environment through their senses, create a series of schemata based on these observations to build an understanding of themselves and the world. The chief attainments during this stage are the ability to distinguish between self and others, and being conscious of the existence of objects, even if they are not visible.

The next ‘preoperational’ stage, between ages 2 and about 6, is characterised by the child’s improving ability to comprehend difference, categorise experiences and things into simple groups, and adopt symbolic means of expression (e.g. language and writing).

Piaget regarded the next, ‘concrete operations’ stage – spanning the years from 7 and around 11 – to be key step in cognitive development. The process of accommodation adapts existing schemata to make sense of the growing accumulation of new experiences. It is at this stage that children begin to adopt abstract, logical thinking to explain sensorimotor experiences, and develop the ability to use inductive reasoning based on observation.

The final, fourth stage of development, which Piaget named “formal operational”, marks the point of cognitive maturation between ages 11 and 15. Piaget asserts that young adolescents at this stage develop abstract and systematic thinking. They are able to hypothesise and deduce outcomes of possible future events based on logical consideration external evidence as well as their own experience.

Like Piaget, Lev Vygotsky was a psychologist concerned with exploring and understanding how children’s activity influences their thinking, but unlike Piaget, he placed greater emphasis on the influence on the acquisition of language, undertook research in the field and proposed practices that might promote and guide learning. Vygotsky’s research took place during a period of economic and social change in Russia (USSR), and his findings are strongly influenced by the political rhetoric of the time as well as the cultural and educational programmes undertaken to bring about the rapid industrialisation of the nascent Communist state (Glassman, 2001).

Vygotsky’s contributions to pedagogic theory are formed around three key ideas: social interaction, the more knowledgeable other (MKO), and the zone of proximal development (ZPD).

According to Vygotsky, “[a]ll the higher functions originate as actual relations between human individuals” (Vygotsky, 1979, p. 57), and an essential part of cognitive development occurs when interpersonal processes are transformed into intrapersonal, social interactions. He uses the example of the development of a child’s ability to point to illustrate this. An infant’s first attempts to grab something is unsuccessful. Eventually she links her internal motivation towards an object with another person, and the interpersonal grabbing motion turns into an intrapersonal pointing gesture. This and other ‘sign operations’ (e.g. language) become transformed and internalised as an essential part of a child’s cognitive development.

Of fundamental significance to this process is interacting and talking with other, more knowledgeable, people and what impact this has on a child’s ability. Vygotsky called the difference between a child’s ability to solve problems independently and the effect of adult guidance or ‘more capable’ peer collaboration on this ability, the “zone of proximal development” (Vygotsky, 1979). In terms of a child’s mental development measured in years, the difference between what she may be capable of determined by tests based on independent work and tests based on collaboration may vary considerably (Vygotsky gives an example of between one and four years).

Vygotsky asserts: “the zone of proximal development defines those functions that have not yet matured but are in the process of maturation, functions that will mature tomorrow but are currently in an embryonic state” (Vygotsky, 1979; p. 86). The impact of this understanding on pedagogy is to reinforce the teachers role as guide in an active learning environment that highlights the collaborative role of learners in the construction of meaning. From a teachers point of view, understanding a child’s ZPD has a significant impact on how they introduce, discuss and develop topics in the classroom. While the ultimate aim is to reduce learners’ ZPD and facilitate improve  independent cognition, from a child’s perspective, this understanding allows her to achieve her potential in cooperation with others. While unaware of Piaget’s and Vygotsky’s theories during much of my teaching practice over the past 30 years, the ubiquity of constructivist theory in recent pedagogy, and the ‘common sense’ nature of these theories has ensured that some awareness of these ideas has seeped into my practice. After all, it makes sense to encourage relevant discussion and peer support in class, and Western cultural recognises different stages in a child’s development. For example, the Catholic Church’s recognition of seven as “the age of discretion…when a child begins to reason” (Papal Encyclicals Online, 2017).

However, while the impact of these theories have had an important impact on teaching methods, they have not avoided criticism on both methodological and epistemological grounds (Shayer, Küchemann & Wylam, 1976; Driver, 1978). A key criticism of Vygotsky is the imprecision of how a child’s ZPD is identified, how it accounts for a child’s current ability, their motivation, or how development actually occurs (Chaiklin, 2003). Similarly, Schaffer (1986) asserts that Piaget’s concept of equilibration, which is fundamental to understanding how staged development works, is “an explanation based on intuition that remains impervious to empirical testing” (p. 763).

While acknowledging the importance of theory to practice, Capel (2016) suggests the adoption of Kolb’s experiential learning cycle (Kolb, 1984) as a model for developing strategies that “develop and change as the pupil becomes more experienced” (p. 711). She also advises that teachers evolve their own teaching methods and adapt them to meet new demands.

My observation of practice in secondary, and post-16 education appears to bear this out. Having observed teachers encourage recall, remind learners of their prior knowledge, provide opportunities for peer learning, differentiate and adapt to different speeds of learning, and attempt to create settings where learners can discuss, work together and adjust their approaches to thinking, overall these approaches seem to be effective.


Capel, S. A. (2016) ‘Helping pupils learn’, in Leask, M., Capel, S., and Turner, T. (eds) Learning to Teach in the Secondary School: A Companion to School Experience. 7th edn. London, UK: Routledge, pp. 694–715.

Chaiklin, S. (2003) ‘The Zone of Proximal Development in Vygotsky’s Analysis of Learning and Instruction’, in Kozulin, A. et al. (eds) Vygotsky’s Educational Theory and Practice in Cultural Context. Cambridge, UK: Cambridge University Press, pp. 39–64. Driver, R. (1978) ‘When is a stage not a stage? A critique of Piaget’s theory of cognitive development and its application to science education’, in Educational Research, 21(1), pp. 54–61.

Glassman, M. (2001) ‘Dewey and Vygotsky: Society, Experience, and Inquiry in Educational Practice’, in Educational Researcher, 30(4), pp. 3–14.

Hopkins, J. R. (2011) The Enduring Influence of Jean Piaget [online] Psychological Science Observer. Available at: https://www.psychologicalscience.org/observer/jean-piaget [Accessed 12 September 2018].

Kolb, D. A. (1984) Experiential Learning: Experience as The Source of Learning and Development, Prentice Hall, Inc., pp. 20–38.

McLeod, S. A. (2018) Jean Piaget’s theory of cognitive development [online] Simply Psychology. Available at: https://www.simplypsychology.org/piaget.html.

Papal Encyclicals Online (2017) Quam Singulari: Decree of the Sacred Congregation of the Discipline of the Sacraments on First Communion. [online] Available at: http://www.papalencyclicals.net/pius10/p10quam.htm

Piaget, J. (1952) The Origins of Intelligence in Children. 2nd edn. New York: International Universities Press, Inc.

Piaget, J. (1970) Genetic Epistemology [online] American Behavioral Scientist, 13(3), pp 459- 480 . Available at: http://journals.sagepub.com/doi/10.1177/000276427001300320 [Accessed 12 September 2018].

Saifer, S. (2010) ‘Higher Order Play and Its Role in Development and Education’, in Psychological Science & Education, (3), pp. 38–50.

Schaffer, H. R. (1986) ‘Child Psychology: The Future’, in Journal of Child Psychology and Psychiatry, 27(6), pp. 761–779.

Shayer, M., Küchemann, D. E. and Wylam, H. (1976) ‘The Distribution of Piagetian Stages of thinking in British Middle and Secondary School Children’, in British Journal of Educational Psychology, 46, pp. 164–173.

Vygotsky, L. S. (1962) ‘The Problem of Speech and Thinking in Piaget’s Theory’, in Hanfmann, E., Vakar, G., and Minnick, N. (eds) Thought and Language. Boston, MA: MIT Press.

Vygotsky, L. S. (1979) Mind in Society: The Development of Higher Psychological Processes. Edited by M. Cole et al. Cambridge, Massachusetts: Harvard University Press.

Automatic Essay Scoring

Papa bloggt! 255/365/ Dennis Skley ©2014/cc by-nd 2.0

Papa bloggt! 255/365/ Dennis Skley ©2014/cc by-nd 2.0

The following is taken from my 18 month upgrade report, which I hope provides an interesting overview of a subject very close to my current area of research.

Grading, ranking, classifying, and recording student activity are fundamental activities in formal education wherever institutions or teachers need to know how students are developing, and where students require feedback, either formative or summative, on their progress. Computational approaches to measuring and analysing this activity holds the promise of relieving human effort and dealing with large amounts of data at speed, but is a controversial topic that demands a multidisciplinary perspective “involving not only psychometrics and statistics, but also linguistics, English composition, computer science, educational psychology, natural-language analysis, curriculum, and more” (Page, 1966, p. 88).

In the mid 1960s as part of ‘Project Essay Grade’ (PEG), a number of experiments to assess the reliability of machine-based essay grading were undertaken, adopting a word and punctuation count method of “actuarial optimization” to “simulate the behaviour of qualified judges” (Page, 1966, p. 90). Using 30 features that approximated to values previously identified by human experts, PEG explored essays written by high school students, and found statistically significant associations with human criteria. The highest associations being with average word length (r = 0.51), use of words commonly found in literature (r = -0.48), word count (r = 0.32), and prepositions (r = 0.25) (p. 93). While costly in terms of the time taken to input, these initial experiments were highly successful, showing strong multiple correlation coefficients equivalent to human experts (r = 0.71).  In the face of hostility and suspicion from progressive as well as established interests, and hampered by the rudimentary computing facilities available at the time, further development of the project waned (Wresch, 1993).

As computers became ubiquitous and as software improved in the decades that followed these initial experiments, PEG was revived and applied to large-scale datasets. These experiments resulted in algorithms that were shown to surpass the reliability of human expert rating (Page, 1994). In recent years the focus of developing automated essay scoring (AES) algorithms has shifted from faculty to the research and development departments of corporations. AES has been successfully marketed, and different systems are currently used to assess students’ writing in professional training, formal education, and Massive Open Online Courses, primarily in the United States (Williamson, 2003; National Council of Teachers of English, 2013; Whithaus, 2015; Balfour, 2013).

While the details of proprietary AES algorithm design is a matter of commercial confidentiality, systems continue to be based on word and punctuation counts and word lists, with the addition of Natural Language Processing techniques (Burstein et al., 1998), Latent Sentiment Analysis (Landauer, Foltz and Laham, 1998), and Machine Learning methods (Turnitin, LLC, 2015; McCann Associates, 2016).

Controversy and criticism of AES has focused on the inability of machines to recognise or judge the variety of complex elements associated with good writing, the training of humans to mimic  computer scoring, over-emphasis on word count and flamboyant language, and the ease with which students can be coached to ‘game the system’ (National Council of Teachers of English, 2013; Perelman, 2014).

However, many of these criticisms are levelled at the wide-spread application of computational methods to replace human rating, criticisms which were clearly addressed early in the development of AES. Page argued that computational approaches are based on established experimental methods that privileges, “data concerning behaviour, rather than internal states, and the insistence upon operational definitions, rather than idealistic definitions” (Page, 1969, p. 3), and that machine grading simply replicated the behaviour of human experts. In response to arguments that machines where not capable of judging creativity, Wresch cites Slotnick’s support for the use of AES to indicate deviations from norms and highlight unusual writing, which could then be referred for further human assessment (Wresch, 1993). In recent work exploring the use of automated assessment in MOOCs, while recognising the limitations of AES in assessing unique writing (e.g. individually selected topics, poetry, original research), Balfour suggests the use of computational methods to correct mechanical writing problems, combined with a final, human, peer review (Balfour, 2013).


  • Balfour, S. P. (2013) ‘Assessing writing in MOOCS: Automated essay scoring and Calibrated Peer Review’. In Research & Practice in Assessment, 8, pp. 40–48.
  • Burstein, J., Braden-Harder, L., Chodorow, M., Hua, S., Kaplan, B., Kukich, K., Lu, C., Nolan, J., Rock, D. and Wolff, S. (1998) Computer analysis of essay content for automated score prediction. Report for the Educational Testing Service.
  • Landauer, T. K., Foltz, P. W. and Laham, D. (1998) ‘An introduction to latent semantic analysis’. In Discourse Processes, 25(2 & 3), pp. 259–284.
  • McCann Associates (2016) IntelliMetric®. [Online] Available at: http://www.mccanntesting.com/products-services/intellimetric/ (Accessed: 1 March 2016).
  • National Council of Teachers of English (2013). NCTE Position Statement on Machine Scoring. [Online] Available at: http://www.ncte.org/positions/statements/machine_scoring (Accessed: 1 March 2016).
  • Page, E. B. (1966) ‘Grading Essays by Computer: Progress Report’. In Invitational Conference on Testing Problems, 29 October, 1966. New York: Educational Testing Service, pp. 87–100.
  • Page, E. B. (1994) ‘Computer grading of student prose, using modern concepts and software’. In The Journal of Experimental Education. Taylor & Francis, 62(2), pp. 127–142.
  • Perelman, L. (2014) ‘When “the state of the art” is counting words’. In Assessing Writing. Elsevier Inc., 21, pp. 104–111.
  • Turnitin, LLC (2015) Turnitin Scoring Engine FAQ. [Online] Available at: https://guides.turnitin.com/Turnitin_Scoring_Engine/Turnitin_Scoring_Engine_FAQ (Accessed: 1 March 2016).
  • Whithaus, C. (2015) Algorithms at the seam: machines reading humans + / -, Media Commons. [Online] Available at: http://mediacommons.futureofthebook.org/question/what-opportunities-are-available-influence-way-algorithms-are-programmed-written-executed-6 (Accessed: 1 March 2016).
  • Williamson, M. M. (2003) ‘Validity of automated scoring: Prologue for a continuing discussion of machine scoring student writing’. In Journal of Writing Assessment, 1(2), pp. 85–104.
  • Wresch, W. (1993) ‘The Imminence of Grading Essays by Computer – 25 Years later’. In Computers and Composition, 10(2), pp. 45–58.

Video production

While  spending most of my time working at my PhD research, because of my previous experience as a media producer, I’m occasionally asked to produce videos that support the work of the Web Science Institute. Last month I produced a video showing the type of work we do here at Southampton (for display on a large screen in a public space in our School), as well a short piece featuring Professor Dame Wendy Hall’s reflections of the 10th anniversary of Twitter.

PhD Research Update

Banna Beach at Sunset Andrew Bennett ©2009 BY 2.0

Banna Beach at Sunset/Andrew Bennett ©2009/cc-by 2.0

It’s been quite a while since I posted, for which I partly blame: writing up the second stage of my research for publication, and for my 18 month upgrade, plus taking on a part time role as Web Science Trust project support officer.

I handed in my upgrade a few weeks ago and had a viva to defend my thesis last week.  The 18 month viva is not as intense as the final grilling you get at the end of the PhD, but provides, as the University of Southampton website says, “a great opportunity to talk about your work in-depth with experts in your field, who have read, and paid great attention to, your work”.  This is true, but I also found it quite unnerving, as it made me realise I still had a long way to go to have confidence in my thesis. Despite what I thought was a fairly lacklustre performance, I somehow managed to pass and am now in the final stretch working towards my final PhD hand in next year. My final piece of work includes a fairly complex and challenging Machine Learning experiment and a series of interviews with MOOC instructors. More of this later.

Going back to my last experiment, this involved  a large scale content analysis of MOOC discussion forum comments which I wrote about in a previous post. Between last November and January this year I recruited and trained a group of 8 research assistants to rate comments in MOOC discussion forums according to two content analysis methods. Overall 1500 comments were rated, and correlations of various strengths were established between the analysis methods and with linguistic indicators of critical thinking. The outputs have provided a useful basis for the next stage – developing a method to automate comment rating that approximates human rating.

A paper on the initial stages of my research that I submitted to Research in Learning Technology has been peer reviewed and accepted, and I am awaiting the outcome of deliberations on the changes I’ve made prior to publication later this year. A paper I hoped to get into the Learning Analytics Special Edition of Transactions on Learning Technologies was rejected (2 to 1 against publication – can’t win ’em all!). But they’ve suggested I re-submit following changes to the text. I’ve just re-written the abstract, which goes like this:

Typically, learners’ progression within Computer-Supported Collaborative Learning (CSCL) environments is measured via analysis and interpretation of quantitative web interaction measures (e.g. counting the number of logins, mouse clicks, and accessed resources). However, the usefulness of these ‘proxies for learning’ is questioned as they only depict a narrow spectrum of behaviour and do not facilitate the qualitative evaluation of critical reflection and dialogue – an essential component of collaborative learning. Research indicates that pedagogical content analysis methods have value in measuring critical discourse in small scale, formal, online learning environments, but little research has been carried out on high volume, informal, Massive Open Online Course (MOOC) forums. The challenge in this setting is to develop valid and reliable indicators that operate successfully at scale. In this paper we test two established pedagogical content analysis methods in a large-scale review of comment data randomly selected from a number of MOOCs. Pedagogical Scores (PS) are derived from ratings applied to comments by a group of coders, and correlated with linguistic and interaction indicators. Results show that the content analysis methods are reliable, and are very strongly correlated with each other, suggesting that their specific format is not significant. In addition, the methods are strongly associated with some relevant linguistic indicators of higher levels of learning (e.g. word count and occurrence of first-person pronouns), and have weaker correlations with other linguistic and interaction metrics (e.g. sentiment, ‘likes’, words per sentence, long words). This suggests promise for further research in the development of content analysis methods better suited to informal MOOC forum settings, and the practical application of linguistic proxies for learning. Specifically using Machine Learning techniques to automatically approximate human coding, and provide realistic feedback to instructors, learners and learning designers.

Just need to re-do the rest now…

I’ve also undertaken two online introductory courses in using the Weka machine learning workbench application and am currently waiting for the Advanced course to start. I’m also attending the Learning Analytics and Knowledge Conference (LAK16) in Edinburgh next week, where I’m very much looking forward to taking a workshop in data mining (using Weka), as well as attending loads of presentations and engaging in some serious networking.

Also, I’m very much looking forward to the summer (hence the photo at the top of the page).