AWL Words on this page from the academic word list

Show AWL words on this page.

Show sorted lists of these words.



  Twitter Facebook Linkedin
YouTube youku RSS iTunes Spotify Google Podcast PodoMatic Patreon Pinterest
Donate
Dictionary Look it up

Any words you don't know? Use the website's built-in dictionary to look them up!

loading


Choose a dictionary.
 Wordnet
 OPTED
 both









Computer Science Academic Vocabulary List (CSAVL) Common academic words in CS textbooks and articles

This page describes the Computer Science Academic Vocabulary List (CSAVL), giving information on what the CSAVL is, how it was developed, and how it differs from other word lists.


To explore the list more fully, try the CSAVL highlighter (on this site).


What is the CSAVL?

The Computer Science Academic Vocabulary List (CSAVL) was developed in 2021 by David Roesler, working at the Department of Applied Linguistics, Portland State University. The CSAVL is a list of academic words which occur frequently in the kind of textbooks and journal articles used by Computer Science (CS) undergraduates at university in the USA, and is intended to provide an efficient tool for CS students to reach a minimum comprehension threshold of 95%. It comprises 904 words in the main list, with an additional 702 in a supplemental, more technical list. In total, the two lists provide 19.90% coverage of a second evaluative corpus, or 1.24% per 100 words.


Unlike some other field-specific (i.e. subject-specific) academic word lists, the CSAVL does not exclude general words. This is because many polysemous words (i.e. words with multiple meanings) may be excluded from the list, despite having distinct meanings in Computer Science that students need to learn. The author gives the following examples: bug, port, tree, string, volume, mouse, instance, for and while.


How was the CSAVL developed?

In order to develop the CSAVL, Roesler created two corpora, one to develop the list, the second to evaluate it. These were called the Computer Science Academic Corpus 1 and 2 respectively (abbreviated to CSAC1 and CSAC2). Each corpus was drawn from textbooks and journal articles, each comprising 10 sub-disciplines, 20 sub-disciplines in total. The textbooks were chosen for relevance, influence and usage, and were intended to represent the core texts students might encounter on an undergraduate CS course, while the journal articles were chosen for relevance, recency and usage (measured by unique downloads), and were ones which students might use for independent research projects. The CSAC1 comprised 12 textbooks and 142 articles, and formed a corpus of 3.5 million words, while the CSAC2 consisted of chapters from 10 textbooks and 42 articles, totalling 700,000 words.


Words for the CSAVL were selected in a broadly similar way to those in the AVL by Gardner and Davies. Specifically, the following six criteria were used.

  • Minimum frequency. Words needed to occur at least 100 times in the CSAC1.
  • Discipline connection. Words needed to occur at least 1.5 as often in the CSAC1 as in a corpus of general English, or have a CS specific meaning.
  • Range. Words needed to occur at least 20% their expected frequency in each of the 20 sub-disciplines.
  • Dispersion. Words needed to be evenly distributed throughout the corpora.
  • Discipline measure. Words could not occur more than three times the expected frequency in more than 3 sub-disciplines (to exclude words which might have been specific to certain sub-disciplines).
  • Additional meaning criterion. High frequency words needed to appear in CS technical dictionaries. This criterion was added to exclude common words such as 'we'.

How does the CSAVL differ from other lists?

The CSAVL contrasts with another field-specific academic list, the CSWL or Computer Science Word List, developed by Minshall in 2013. Unlike the CSWL, which excludes words from the GSL and AWL and therefore forms a third, supplementary list, the CSAVL is intended to be a stand-alone list of academic vocabulary.


Roesler found that the CSAVL and CSAVL-S combined gave a coverage of 19.90% in the CASC2, which is more than the 17.26% provided by the AWL and CSWL combination. Additionally, the coverage per 100 words was 1.24 for the former, and 0.55 for the latter, showing that the CSAVL/CSAVL-S are more efficient than the AWL/CSWL, which was a key consideration in the construction of the CSAVL.


In comparing the CSAVL alone with both the AWL and AVL, the author found that the AWL offered slightly better coverage (18.64% compared to 16.06%), the AVL slightly less (12.20%), but again, the CSAVL was more efficient, offering 1.78% coverage per 100 words, compared to 0.62% for the AWL and 0.63% for the AVL.


Finally, in combining the CSAVL and CSAVL-S with the new-GSL, a lemma based general word list, Roesler found that this gave 94.77% coverage, very close to the minimum 95%. While the GSL/AWL/CSWL combination fared slightly better, with 95.49%, this combination has 8489 words in total, far more than the 3918 for the new-GSL, CSAVL and CSAVL-S, which again provided better efficiency, of 2.42% per 100 words, in contrast to 1.12% for the GSL/AWL/CSWL combination.


References

Roesler, R. (2021) 'When a bug is not a bug: An Introduction the the computer science academic vocabulary list', Journal of English for Academic Purposes, 54 (2014) 101044. Available from https://doi.org/10.1016/j.jeap.2021.101044.


Computer Science Academic Vocabulary List

I have asked the author for permission to add the list to the site, and hope to be able to do so soon.




logo



Sheldon Smith

Author: Sheldon Smith    ‖    Last modified: 25 October 2021.

Sheldon Smith is the founder and editor of EAPFoundation.com. He has been teaching English for Academic Purposes since 2004. Find out more about him in the about section and connect with him on Twitter, Facebook and LinkedIn.



Popular pages in the vocab sectionMost viewed pages