Student Perspectives: Warding gestures and flourishes: Text analysis and the creative writing process

Mahmud El-Shafey considers the uses of text analysis applications for creative writing.

You can follow Mahmud on Twitter @MahmudElShafey.


Over the past few months, I have balanced my LIS studies with a Faber Academy course in fiction writing where I am trying to develop a sci-fi novel (elevator pitch: The Arab Spring… in space!). One might think that the two would have very little overlap, but I have experienced an interesting cross pollination of ideas (I’m looking at you, new A.I. character Otlet). Most recently, I have seen how text analysis can serve as a useful tool for creative writing, and particularly the editing process.

Running the first 30,000 words of my first draft through Voyant, an open-source web-based application for performing text analysis, helped clarify a number of things for me. First, a quick glance at the word cloud feature provided immediate revelations. Some obvious. Yes, my main character’s name (Iskander) is the most used term, followed closely by other character names (Damietta, Lebanon, Taki). Some less obvious. Why is “greeted” so high? And “nodded?” Who are these characters I’ve written that are spending all their time greeting and nodding at each other while on the other hand, there are only two instances of “goodbye?”

More importantly, some words that I felt certain would be among the most used terms were completely absent. These obviously loomed larger in my imagination than the physical text.

In addition to the word cloud feature, Voyant includes a number of other tools that I found useful:
◾“Phrases” revealed that on three occasions, Taki performed an action “with a flourish” and that on two occasions Iskander “made a warding gesture.”
◾‘TermsBerry’ was useful in seeing what characters are particularly associated with what words. So Iskander was most closely linked to the characters he interacted with, as well as words like “knew” and “thought” whereas Damietta, a non-POV character, enjoyed a far broader range of observed actions.
◾“Trends” was particularly helpful in tracking characters’ appearances and viewing what is happening in the background of each section of the text.

These insights will certainly prove helpful in the editing and re-writing process. In a final draft, I will make sure that my characters nod less, and pour coffee with less flourishes. However, not all of my fellow students agreed with me on the use of text analysis.

I shared an image of my word cloud with my fellow Faber students, sparking off a spate of text analyses. Some who tried it for themselves came away feeling unsatisfied, viewing the process as “mechanical” and potentially stifling creativity. They also pointed out that the analysis could not take into account authorial intent, such as when particular words or colours are purposefully associated with a character, or when a phrase might be repeated for effect.

While there have been many examples of how text analysis can be used by students of a text, there is no reason why it can’t also be used in the same vein by authors during the writing and editing process. But beware, your mileage may vary.

Student Perspectives: Mining For Data

Nina Byrom looks at the benefits and potential drawbacks of integrating data mining into library and archive catalogues from the perspective of the digital humanities. You can follow Nina on Twitter @ByromNina.


The discussion about the rise of text mining software in relation to the Digital Humanities caused me to reconsider what the future of library collections and archives could look like. There is a view that at present, integrating text and data mining software into collections and archive research could provide researchers and collection managers with the ability to work on large-scale projects and documentation efforts that would have previously been too labour intensive to undertake. (Berry, 2019) However, the ability to extrapolate thematic data across entire collections, to distance read on a wide scale, has the potential to impact how digitised library collections are presented in the future. If, for example, a collection was found to have multiple common elements would there be a danger of it being marketed to the users based on those elements alone, potentially excluding less frequent, but still important elements?

There is arguably a precedent for this in the library sector already due to specialised collections, but even in these there is a level of variance. The risk of text mining being used to summarise collections in the broadest terms possible could be balanced out by the level of detail that it can provide, allowing curators to have a deeper understanding of a larger percentage of a collection. This could create a more effective curation cycle, as changes to the contents of the collection would be more easily comparable.

Changes in digital documentation due to the Digital Humanities could also change the way that library users access and use information. This is supported by the integration of digital media into everyday life, as Sabharwal (2015) argues that the GLAM sector has adapted to digital presentation and curation approaches, in order for a remote audience to be able to access previously inaccessible information. (p.125) However, Sabharwal (2015) also argues that in the digital sphere public curation of collections can also occur, driven by the rise of social media. (p.125) With many institutions already maintaining multimedia social media presences that allow for public critique, (Sabharwal, 2015, p.127) there is a potential risk that text mining could be used by institutions to collect and study external public opinion en masse. However, due to the availability of free-to-use text mining software, users could do the same in order to extrapolate meaning from an institution’s digital presence, creating a level of scrutiny that the sector may not be accustomed to receiving.

The availability of text mining also raises another interesting possibility for the library sector; that it could be built in to digital catalogues, in order to allow researchers and users to feed their findings based around catalogue contents back into the catalogue itself, using data and text mining in tandem to promote further development. Something similar has already occurred in research projects such as Curating Menus, which cleans and uses publicly sourced datasets about the New York Public Library’s menu collection. (Rawson, 2016, p.59) It’s important to note that this relies upon users accessing and inputting information through a single data collection system, however as Rawson (2016) points out, the digital humanities research and curation of documented data inform each other (p.60) which allows for future users to access an increasing level of information from the collection, as ‘anyone can easily download the data set from NYPL’s website’. (Rawson, 2016, p.59) The model of publicly-created datasets working in tandem with the developing Digital Humanities could potentially lead to majority publicly-curated catalogues in the library sector, where user generated datasets are monitored by data and text mining software in order to provide a better, more accessible service.


Berry, D. (2019), What are the Digital Humanities? The British Academy Blog [blog], 13 February. Available at: [Accessed 28 November 2020]

Rawson, K. (2016), Curating Menus: Digesting Data for Critical Humanistic Inquiry, In: White J. and Gilbert, H. eds. Laying the Foundation: Digital Humanities in Academic Libraries, Indiana: Purdue University Press, pp. 59-72

Sabharwal, A. (2015) Digital Curation in the Digital Humanities: Preserving and Promoting Archival and Special Collections, Kidlington: Chandos Publishing

Student Perspectives: Why Big Data and Democracy?

Here, Emilio Sensale explores the connections between the digital revolution and democractic political systems and the impact of big data on people’s ability to exercise their own judgement without technological interference. This post was originally published on Emilio’s blog You can follow Emilio on Twitter @EmilioSensale.

I never really liked the phrase ‘Digital Era’ used to define the times we are living in. The main reason is because all the focus is on the technologies; please don’t get me wrong, I’m not technophobic or a nostalgic of the ‘good old times’ when all the devices, including the Internet, mobile phones, etc., were still only a futuristic idea, but what worries me is the dehumanizing connotation that such technology carries. However, the implication that this era is characterised by the predominance of digital content is not far from reality even though we should always remember that behind all that there’s human craft and intervention. In fact, with my first post I’d like to explain the reasons why I choose the title Big-Data and Democracy for my blog, so in order to fulfill this task I think that a good start might be finding a definition for both Democracy and Big-Data.

The former derives from the greek word dēmokratiā – dēmos ‘people’ and kratos ‘rule’ – and Wikipedia defines it as ‘a form of government in which the people have the authority to choose their governing legislation’. I think that the stress should be on the words ‘authority’ and ‘choose’. As for Big-Data, Wikipedia defines it as ‘a field that treats ways to analyze, systematically extract information, or otherwise deal with data sets’. I’d like to focus on the processes of analysis, extraction and use of personal data, furthermore I believe that it is important to consider the concept of ‘consent’. None of these definitions are meant to be exhaustive, it’s just an attempt to create a framework of ideas that I’d like to explore and discuss.

When I think about the constant flow of data that IT companies collect and store form our online interactions I cannot help myself wondering about how much control we are allowing them to have on our decisions, especially when it come to primary needs – or not – and choices that will have any sort of influence on our lives. How many of those needs and ideas are really ‘real’? How thin is the line between violating privacy and legitemately profiting from data? Is it ethically acceptable and if positive to what extent? Are we aware of what that click on the ‘allow all cookies’ button does? What are we really agreeing to? To what extent are we all really well informed about our decisions, also how reliable are the most common sources of information – e.g. Wikipedia? My aim here is not to give a comprehensive answer to all these questions but to keep looking for different angles, unleash my curiosity, explore further a reality that it’s often taken for granted and discuss changes that are considered inevitable. Most of these subjects have already been analyzed and many debates have been raised in different fields of study including, AIE, Sociology, Psychology, Anthropology, Philosophy, etc.; in fact, an extensive across-the-board approach seems to be the most viable one, even though I’m aware that the risk of losing track of the main topics is quite high.

With that said, I’d like to go back to some words highlighted in the second paragraph: authority and choice. If we consider these two ideas in the context of a democratic system I believe that any interaction between them leads us to two more concepts such as power and delegation. In fact, what we do as members of a democratic society is choose – delegating authority through our vote – who will exercise legislative, administrative and judicial powers. What is paramount to make a choice is the creatioin of opinions, hence all the tools that we use to help us with this task should be reliable, transparent and accessible. Fact checking may be helpful in many situations but it may not be enough if we don’t keep in mind the importance of understanding and interpreting data – or information – in general. So if we go back to the definition of Big-Data it might be easier to understand what are my concerns regarding the influence that a specific usage of tools such as profiling and data mining can have on the process of creating opinions.

There are no answers in this blog, in fact, my purpose is to discuss, analyze and confront opinions with people who may have similar – or not necessarily similar – concerns.

Student Perspectives: Lockdown, and What the Digital World Cannot yet Replace

In this post, Nina Byrom considers how Floridi’s theory of digital proxies has become a reality in the context of Covid-19 but also looks at how lockdown has revealed the limitations of the digital in acting as a stand-in for three-dimensional reality. This post was originally published on Nina’s blog You can follow Nina on Twitter @ByromNina.

At the start of 2020 technology and digital data already played a large part in our society, allowing us to access the ‘constantly growing amount of valuable information available today, on any topic’ (Floridi, 2015, p.489) with ease. We could talk, shop, stream, and search without leaving our homes. Then lockdown was imposed, and overnight ‘living onlife and in the infosphere’ (Floridi, 2015, p.489) which was previously a choice, became our only option.

The digital world now had to replace areas of our lives that it had previously only been partially involved in, with some replacements being smoother than others. Online shopping was arguably an easier transition as the experience remained relatively untouched; you could still browse, find what you needed, and have it all delivered. The digital experience here is what Floridi (2015) termed a proxy, where something is not only a representation of an item, but also acts as a suitable replacement for the item. (p.488)

Another example of the digital world potentially becoming a proxy for an in-person aspect of our lives is the continuing use of video conferences to replace group working and learning environments. In professions where remote working is a viable option it could become standard practice in the future, further integrating technology into our lives. However, while a digital environment does provide users with freer data access, it also increases the amount of data being acquired about users. Conversations which would have taken place in-person are instead played out in chats, and in a digital learning environment the element of socialisation that comes from being around others may well be lost. The future of fully digital work and learning relies on how much we want to use the digital world as a proxy for our offline lives, considering how much more data could be collected on us.

While some areas of our lives were replaced by digital proxies successfully, there were also areas where digital data failed to act with complete success. One of these areas was the digitisations of art and sculpture that had to replace our ability to visit museums. While digitisation does provide an image, it is arguably without a proper sense of perspective; you cannot move in relation to the piece as you would in-person because the image is fixed. Virtual tours could have replaced the experience of visiting a museum but they are also limited, by the need to cater to all potential users and so certain data is prioritised. Scale is another issue, as a digital image will always be constrained by the size of the screen it is being viewed upon. This is an example of the digital world functioning as ‘a degenerate proxy that only stands for, but cannot behave on behalf, or act instead’ (Floridi, 2015, p.488) of an in-person experience. The image stands for the art in question, but cannot truly replace it.

In a similar way, via the digital world we can access location data, videos, weather reports, and learn about the history of a place we have never been to. But that data cannot give you the experience of being in that place. While we could potentially live entirely via the digital world, and digital data is already a large part of our lives for better or for worse, lockdown has shown the areas where that data cannot be a proxy for in-person experience. Looking out of a digital window is not the same as being able to go outside.


Floridi, L. (2015) ‘A Proxy Culture’, Philosophy and Technology, 28(4), pp. 487-490. DOI:

Student Perspectives: Franco-Arabic and the Secret History of Writing

The Student Perspectives category collects posts written by current CityLIS students. This post is written by Mahmoud El-Shafey who considers the evolution of spoken and written Arabic. Mahmoud is on Twitter @MahmudElShafey. This post first appeared on Mahmoud’s blog on17/10/20.

One of the best things about the further “reading” on my LIS master’s is that this doesn’t just include dry academic text books and reports, but also news stories, documentaries and even YouTube videos. One excellent resource that students were recommended for our Story of Documents class was Dr. Lydia Wilson’s BBC4 series The Secret History of Writing.

The first episode, in particular, which deals with the Rebus Principle and the birth of alphabets was mind-blowing. I couldn’t have been more surprised to learn that ancient Egyptian hieroglyphics are “sleeping” in the very letters I am typing out now. I’ll never look at the letter A again without thinking of the Egyptian hieroglyph for a bull.

And the same applies to the other language I speak: Arabic. Both Arabic and Latin type have the same origin. In the third episode, Dr. Wilson delves into the rise of Franco-Arabic or Arabic chat (personally, I’ve always heard it called Arabizi). Arabic text written using Latin type, but with a few small addition. For example, ع‎ (ain) becomes a 3 and the ء (hamza or glottal stop) becomes a 2. Is this the future of Arabic?

As a former Arabic translator, I’m not sure how I feel about all this. Arabic is a wonderful but complicated language. I’m not sure if the Latin alphabet is able to encompass its linguistic complexities, even if this is only used for simple communication purposes. Things like how nouns and adjectives are declined according to case, gender, and even number could easily be lost in transcription.

Moreover, there isn’t any one, single Arabic. Arabic is a diglossia, featuring “high” and “low” versions. Franco-Arabic is, so far at least, mostly used for ease of communication. It originated in online chat and today is mostly used for text messaging. But everybody speaks a different version of this “low” form of Arabic. Regional dialects can vary significantly, not just in terms of what words are used, but also how the same words are pronounced.

Simply saying, “How are you?” could vary from “Ezzayak?” (Egyptian), “Kayf al-Hal? (Gulf Arabic), “Ish lawnak?” (Levantine) or Kidera? (Moroccan Arabic). And I’m sure I’m missing a lot of other examples.

In any case, I decided to give Franco-Arabic a whirl, and so the below is my transcription of the beginning of Palestinian poet Mahmoud Darwish’s famous poem Mural. Immediately, even the title of the poem causes a problem. In Egyptian Arabic, the Arabic that I speak, the ج is pronounced with a hard g sound. But in Gulf or Levantine Arabic, it would be pronounced with a softer j sound. So, is it Gaddariya or Jaddariya? Well, that depends on who is speaking.


Haza huwa ismak
8alit imra2a
W ghabat fi al mumur al lawlabi

Ara al sama2 hunak fi mutanawal al aydi
W yahmaluni ginah hamama bay9a soub
Tafoola ukhra. W lam ahlam b2nee
Kunt ahlam. Kul shay2 waq3. Kunt
A3lam inani alqi b nafsi ganiban
W ateer. Sawf akoon ma sa2seer fi
Alfalak al akheer. W kol shay2 abyad
Albahr alm3laq foq saqf ghamama
Bay9a. W illa shay2 abyad fi
Sama al mutlaq al bayda2. Kunt, w lam
Akun. f2na waheed fi nawahi ha zehe
Alabadiya albayda2. Gi2tu qabeel miy3di
F lam yazhar malak wahid l yaqool li:
(Maza fa3lt, hunak, fi al dunya?)
W lam asm3 hetaf al tayibeen, w la
Aneed al khati2een, ana waheed fi al baya9,
Ana waheed…

And here is the translation:

Mural by Mahmoud Darwish

This is your name /
a woman said
and disappeared in the spiralling corridor
I could see the sky over there within my grasp.
A dove’s white wing carried me toward
another childhood. I wasn’t dreaming
that I was dreaming. Everything was realistic. I knew
I was tossing myself to the side
before I flew. I would become what I want
in the final orbit. Everything was white:
the sea hanging above the roof of a white
cloud was nothingness in the white
sky of the absolute. I was
and I wasn’t. I was alone in the corners of this
eternal whiteness. I came before my time and not
one angel appeared to ask me:
“What did you do, there, in life?”
And I didn’t hear the chants of the virtuous
or the sinners’ moans, I was alone in whiteness,

And so, what have I learned from this little exercise? Transcribing into Franco-Arabic did not come naturally to me, not at all. I had to stop many times and think, should I use a 2 here? Should this d become a 9? I could have transcribed this excerpt into Arabic in half the time, although admittedly, a younger person, who uses Franco-Arabic on a daily basis, might not have faced this obstacle.

I think the biggest thing I have learned is that the Arabic alphabet will survive. We should not fear that Franco-Arabic would or even could replace it. Arabic has proven wonderfully persistent. And yes, there are obvious cultural and religious reasons for this. And yes, this persistence does have its downsides, as with Arabic’s resistance to the moveable type revolution. But ultimately, I think that Arabic, this complicated language with its extra vowels written counter-intuitively from right-to-left, is here to stay.

That’s not to say Franco-Arabic is going anywhere, either. This is clearly not a flash in the pan. It will continue to grow as more and more people use it to chat and WhatsApp and text. And isn’t that – everyday communication – ultimately what language is for?

