DAL MOOC – Weeks 3/4 Social Network Analysis

'dalmooc' Twitter Seach NodeXL Graph

‘dalmooc’ Twitter Search NodeXL Graph/Tim O’Riordan ©2014/cc-by-sa 3.0

Sadly I couldn’t get Gephi (the recommended network visualisation tool for this course) to work, as it involved downgrading my version of Java, and the many comments on this issue in the DAL MOOC discussion forum didn’t fill me with confidence. So, the social network graph shown above was created using the NodeXL template in Excel. It’s not as pretty as some graphs I’ve seen, but it does works well enough.

The graph is built using the ‘import from Twitter search network’ function (I’ve listed the settings I used to create it at the foot of this post) and shows those mentioning ‘dalmooc’ in their tweets between 3 and 11 December 2014. It clearly demonstrates the centrality of two tutors (George Siemens and Dragon Gašević) to the discussion in the final weeks of the MOOC – which is probably not the preferred outcome for a course that aims to engage learners in collaboration and co-creation of knowledge. However, in-depth analysis was carried out by more experienced hands than mine, which I report later in this post.

In week’s 3 and 4 the course has moved towards a more in-depth discussion on social network analysis and an indication of the common metrics used to analyse learning interactions. Before I get onto that topic, I’ll say something about my understanding of the importance of social networks.

Social Networks

Social network formation is a dynamic process in which individuals typically interact with others similar to themselves [1] and use their network connections as social endorsement [2]. People may choose new acquaintances who are friends of friends – a process known as triadic closure [3]. This type of ‘weak tie’ tends to be more useful than the stronger ties associated with close friends [4, 5], however, there is some evidence that triadic closure does not support higher levels of trust, and that ‘real world’ interaction plays a significant role in developing trust relationships [6]. This suggests that spatial and social proximity is significant in SNAs dependent on high levels of trust. Be that as it may, social interaction is seen as the single most important influence in business, in the workplace, and in job hunting, and online networking exhibit similar attributes.

The analysis of social networks is an important and emerging field of study in education. Vygotsky [8] discusses the incidence of higher internal processes resulting from social interaction; Johnson and Johnson [7] report that social learning is effective, and Garrison et al. [9] assert that social learning has an efficacious effect on critical thinking. Social networks, social activities and social interaction are critical predictors of academic performance [10, 11], and, Tinto [12] suggests that social interaction is good for student retention.

Social Network Analysis (SNA) for learning provides deep insights into the different social processes that unfold while learning takes place. So what metrics are useful in assessing and evaluating SNA for learning?

Essentially, SNA in this context interprets network interaction within learning activities and environments and looks at typical SNA metrics like network density, centrality, closeness, betweeness, number of degrees, in-degree (who is connected to you), out-degree (who you are connected to), and modularity. As weak ties do most of the work in social networks, SNA for learning explores interaction within the log data of VLEs and online social networking applications used to support learning, and identify pedagogical practices that support the development of of these types of connections.

Week 3

In the week 3 Google Hangout, Shane Dawson,  one of the key developers of the VLE discussion forum analysis and visualization tool SNAPP, discussed using SNA to look for learners with high ‘betweeness’ scores. These learners tend to communicate early, more widely and fill ‘structural holes’ between diverse networks – all possible indicators of creativity. When constructing arguments in social networks, learners benefit from having another to work with. Thinking aloud and verbalising involves reflection and regulation of the quality of learning, and activates deeper processes for learning. To support this Dr Dawson referred to Maarten de Laat‘s [12] work, which suggests a strong connection between creativity, ‘dialogic’ skills and ‘higher order thinking’. Essentially, networked and agile learners who demonstrate the capacity to see multiple perspectives (e.g. have high betweeness scores), strongly indicates engagement in higher quality of critical thinking and creative attributes.

In online learning environments learners’ social interaction requires observation and analysis, and may require some intervention (‘scaffolding’). Two key activities may require support: learners who ‘bounce’ around a network, rapidly moving from one subject to another, may indicate dissatisfaction with co-creating/collaborative environments; and ‘over-communicative’ learners who dominate interaction may require attention. In addition, while social capital may be accrued by developing high betweeness centrality, the linguistic content of the messages is critical to understanding their value. So, responding to others with useful and ‘on-task’ message have a positive effect on a learners’ social capital, while thanks and compliments have a negative association.

Week 4

In week 4 we moved onto making sense of SNA in a learning context. In his introduction Dr Gašević was clear that the key role of learning analytics are not simply to gather online quiz scores, course grades (which can only provide a snapshot of achievement), or trivial measures (e.g. the number VLE log-ins), but to analyse and interpret dynamic learning products, (unstructured text in online comments, tags, or blogs) [14]. Learning analytics should be about learning, and the critical areas under examination need to include: learning design, community building, creativity, social capital, academic performance, and distributed pedagogy.

To demonstrate analytics in practice, Dr Gašević introduced the courses’ two ‘data tzars’, Vitomir Kovanovic and Srecko Joksimovic. The ‘tzars’ had collected, cleaned, analysed, and constructed three directed, weighted social graphs from the first two weeks of the MOOC, from data generated by learners and course leaders within the edX discussion forum, Facebook and Twitter feeds. They carried out four main activities:

  1. Calculated betweeness and degree centrality
  2. Calculated linguistic properties for each message and student (i.e. LIWC, coherence, Coh-metric) – Number of words in a sentence, number of average words per sentence, calculated average per student,
  3. Ran regression analyses between betweeness and linguistic properties – differences between what learners write and write about, and position in the graph
  4. Made visualisations

The main outcomes were that:

  • Learners who post messages with deeper cohesion (e.g. continuing a thread, quoting others, asking questions, expressing appreciation or agreement) tend to be the central nodes in the network.
  • Cognitive processes and word count are the best predictors of network position – at this stage of the course.

In their recent conference paper Kovanovic, Joksimovic, Gašević, & Hatala [15] assert that messages containing affective (e.g. emotional, humourous, or self-disclosing), cohesive (e.g. quoting others, asking questions, complementing, agreeing) and interactive (e.g. addressing named persons, using inclusive pronouns, greetings) facets of social presence “significantly predict the network centrality measures commonly used for measurement of social capital”.

Which is fairly intuitive, after all, the more you say – and the more friendly you are – the more you are likely to be central in a network. However, it also appears that whenever different cognitive processes within the language are attended to, this has a positive impact on indicators of social capital.

The messages they analysed tended to have high deep cohesion, and low referential cohesion. Which indicates that learners in these networks are demonstrating good networked learning skills and are employing deep levels of knowledge construction. They are all good at building knowledge, building connections, and sharing ideas within the environment.

Metrics, Graphs and Tools

The key metrics used in this study were:

  • Word count (WC – count of words in a message)
  • Causation (cause – because, effect, hence)
  • Cognitive processes (cogmech – cause, know, ought)
  • Text coherence (LSA – average similarity between sentences in a message)
  • Deep cohesion (the extent to which the ideas in the text are cohesively connected at a deeper conceptual level that signifies causality or intentionality)
  • Referential cohesion (the extent to which explicit words and ideas in the text are connected with each other as the text unfolds.)

The following graphs were produced:


Twitter Big Picture/DAL MOOC edX ©20114


edX Big PictureDAL MOOC edX ©2014

Facebook Big Picture/DAL MOOC edX ©20114

Facebook Big Picture/DAL MOOC edX ©20114


Shapes: students = circles; instructors = squares
Nodes: colour = community, size = betweeness centrality, label = out-degree (# replies)
Twitter edges: blue = retweet, red = mention, green = reply
Facebook edges: blue = comment, red = like

The ‘Tzars’ Toolkit:

  • R (igraph) and Python – graph extraction and analysis
  • LIWC – Linguistic Inquiry and Word Count (LIWC), text analysis software that calculates the degree to which people use different categories of words across a wide variety of texts
  • SEMILAR – The Semantic Similarity Toolkit (SEMILAR) “software environment offers users, researchers and developers easy access to fully-implemented semantic similarity methods.”
  • Coh-metric – a computational tool that produces linguistic and discourse representations of text.

Additional analysis will be carried out, including:

  • Keyword extraction (e.g. Alchemy)
  • Topic modelling – co-occurrence graphs vs LDA (Linear Discriminant Analysis, used to find characteristics or differences between two or more classes of objects or events).
  • Other centrality measures vs LIWC/Coh-Metric (e.g. degree, closeness centrality)

Node XL Tweet Search Import:

  • Term: dalmooc, 85 tweets, between 3/12 and 11/12
  • Edge – colour = relationship, width = relationship, label = relationship
  • Vertices (nodes) – colour = followed [red=most, green=least], shape = betweeness centrality [square = >100], size = betweeeness centrality, label = vertex name
  • Dynamic filters – Out-degree >1
  • Layout: Franchterman-Reingold

Back to top


  1. McPherson, M, Smith-Lovin, L and Cook, J M (2001). Birds of a Feather: Homophily in Social Networks. In Annual Review of Sociology Vol. 27: 415-444.
  2. Karlan, D, Mobius, M, Rosenblat, T, and Szeidl, A (2009). Trust and social collateral. In The Quarterly Journal of Economics, 124(3), 1307-1361.
  3. Rapoport, A (1953). Spread of information through a population with socio-structural bias: I. Assumption of transitivity. In The Bulletin of Mathematical Biophysics, 15(4), 523-533.
  4. Granovetter, M S (1973). The Strength of Weak Ties. In American Journal of Sociology, 78(6)1360-1380. The University of Chicago Press
  5. Watts, D J (2003). Six Degrees: the Science of a Connected Age. London: W W Norton and Company
  6. Bapna, R, Gupta, A, Rice, S, and Sundararajan, A (2011). Trust, Reciprocity and the Strength of Social Ties: An Online Social Network based Field Experiment. In Workshop on Information Systems and Economics.
  7. Johnson, D W, & Johnson, R T (2009). An Educational Psychology Success Story: Social Interdependence Theory and Cooperative Learning. In Educational Researcher, 38(5), 365-379.
  8. Vygotsky, L S (1978). Mind in society: The development of higher psychological processes. M. Cole, V. John-Steiner, S. Scribner, & E. E. Souberman, (Eds.) Cambridge, MA: Harvard University Press.
  9. Garrison, D R, Anderson, T, & Archer, W (2001). Critical thinking, cognitive presence, and computer conferencing in distance education. In American Journal of Distance Education, 15(1), 7-23.
  10. Gašević, D, Zouaq, A, Jenzen, R (2013). Choose your classmates, your GPA is at stake!’ The association of cross-class social ties and academic performance. In American Behavioral Scientist, 57(10), 1459-1478.
  11. Astin, A (1993). What matters in college: Four critical years revisited. San Francisco: Jossey-Bass.
  12. Tinto, V (2006). Research and Practice of Student Retention: What Next? In Journal of College Student Retention: Research, Theory and Practice, 8(1), 1-19.
  13. De Laat, M, Chamrada, M, & Wegerif, R (2008). Facilitate the facilitator: Awareness tools to support the moderator to facilitate online discussions for networked learning. In Proceedings of the 6th International Conference on Networked Learning, 80-86.
  14. Gašević, D, Dawson, S, Siemens, . (2015). Let’s not forget: Learning analytics are about learning. In TechTrends (in press)
  15. Kovanovic, V, Joksimovic, S, Gasevic, D, & Hatala, M (2014). What is the Source of Social Capital? The Association between Social Network Position and Social Presence in Communities of Inquiry. In Proceedings of  G-EDM 2014: Workshop on Graph-based Educational Data Mining at Educational Data Mining Conference (EDM 2014), July 4-7, 2014, London, UK.

DAL MOOC – Week 2 Data Wrangling

Tony Hirst/Sam Easterby-Smith  ©2007/cc-by-nc-sa 2.0

Tony Hirst/Sam Easterby-Smith  ©2007/cc-by-nc-sa 2.0

In this video, Open University academic, Tony Hirst talks about managing and analysing data, following the “4 Steps of Data Wrangling”: Clean, Shape, Augment, and Look. As with the other two videos I’ve reviewed, I’ve followed the spirit of the ‘revise/remix’ ethos of the course and have edited out the glitches (and enlarged the slides).

In summary, Tony provides a brief overview of following data wrangling tools:

He demonstrates Pivot tables and Sankey diagrams, and suggests looking for, outliers, similarities and differences, and trends, when exploring data for visualisation.

Tony also quotes John Tukey’s statement from half a century ago, that computers would allow people to become “journeymen carpenter’s of data analytics” and quotes Leland Wilkinsons’ to support the use of powerful tools to make sense of data and develop data narratives.


See more of Tony’s thinking at blog.ouseful.info and at github.com.