Unfortunately, a lot of the word lists that are available online are not in structured machine readable format (eg, some are pdfs, others are in html).
This is a collection of those lists converted into .yml and .json, so they
can be readily consumed and manipulated by computers, with an expectation that
they might form the basis of materials generation in web-based applications.
University Word List (UWL)
An 836-item academic vocabulary list that compiles academic English common to a variety of disciplines but excluded from the GSL. Especially useful for academic reading and provides on average some 8.5% coverage of academic texts (Nation & Waring, 1997). Divided into 11 sublists based on frequency.
source: http://www.auburn.edu/~nunnath/engl6240/wlistuni.html
Academic Word List (AWL)
A general-purposes academic word list, particularly for reading, with 570 word families that are not included in the GSL but that have wide range in academic texts, across disciplines (based on corpus research in arts, commerce, law, and science). Further divided into 10 sublists that reflect frequency and range.
source: https://www.victoria.ac.nz/lals/resources/academicwordlist
Academic Keyword List (AKL)
930 potentially academic words which are listed into nouns, verbs, adjectivies, adverbs and others. The AKL differs from Coxhead’s Academic Word List as it includes high frequency words (e.g. aim, argue, because, compare, explain, namely, result) shown to play an essential structuring role in academic prose.
source: https://uclouvain.be/en/research-institutes/ilc/cecl/academic-keyword-list.html
Academic Vocabulary List (AVL)
Based on 120 million word academic texts in Corpus of Contemporary American English (COCA). Compared to AWL is based on more recent texts, has more coverage and more information regarding word families. Sub-genres include History, Education, Law & Political Science, Social Science, Humanities, Philosophy, Religion, Psychology, Science & Technology, Medicine & Health, Business & Finance. Can download in both lemma and word family format.
source: https://www.academicvocabulary.info/
New Academic Word List (NAWL)
963 academic words from a 288 million word corpus comprising of Cambridge Corpus of English, Michigan Corpus of Academic Spoken English, British Academic Spoken English and Published textbooks.
source: http://www.newgeneralservicelist.org/nawl-new-academic-word-list/
Academic Vocabulary List - Shared Disciplines
The AVL (Academic Vocabulary List, 2013) was reduced to 427 lemmas ordered by part of speech and frequency. The BAWE (British Academic Written Corpus) was used to check for coverage of the AVL. The list uses a cut off of at least 12 per million with coverage in at least 90% of the disciplines in BAWE. Disciplines: Economics, Business, Sociology, Politics, Psychology, Cybernetics & Electronic Engineering, Architecture, Linguistics, Anthropology, Hotel, Leisure & Tourism Management, Engineering, Computer Science Meteorology, Health, Law, Philosophy, Agriculture, Planning, Archaeology, Mathematics, History, Biological Sciences, Chemistry, Physics, Food Sciences, Contemporary American Studies, English, Publishing, Medicine, Classics.
source: https://www.anthonyteacher.com/wp-content/uploads/2016/06/AVL.xlsx
General Science Jargon List
12,000 articles were selected (biomedical and life sciences) to identify words frequently used in the scientific literature. In order to avoid any recency bias, 2,000 articles were randomly selected from six different decades (starting at the 1960s). From these articles, the frequency of all words was calculated. These are non-subject-specific words that are frequently used by scientists. This list contains words with a variety of different linguistic functions. General science jargon can be considered the basic vocabulary of a 'science-ese'. Science-ese is analogous to legalese, which is the general technical language used by legal professionals.
Academic Vocabulary for Middle School Students
The Middle School Content-Area Textbook Corpus (MS-CAT) consists of 109 content-area textbooks designed for sixth, seventh or eighth graders in US public schools. The total word count is over 18 million words. There are five subject area sub-corpora: English grammar and writing, health, math, science, and social studies and history. Literature and reading material is not included in the English corpus. To build the five MS-CAT lists, the GSL 1000 and 2000 families were excluded. Both AWL words and non-AWL words which reached certain frequencies were included. Each sub list is about 400 words.
source:
Academic Spoken Word List
The ASWL contains 1,741 word families with high frequency and wide range in an academic spoken corpus totaling 13 million words. The list, which features vocabulary from 24 subjects across four equally sized disciplinary subcorpora, is graded into four levels according to Nation’s British National Corpus and Corpus of Contemporary American English lists, and each level is divided into sublists of function words and lexical words. A flemma list is also available.
source: https://osf.io/gwk45/
Soft Science Spoken Word List
The list consists of the 1,964 most frequent and wide-ranging word-families in a 6.5 million word corpus of soft-science speech, which represents 12 subjects (Art, Cultural Studies, History, Philosophy, Political Studies, Psychology, Business, Economics, Education, Law, Management, Public Policy) across two equally-sized sub-corpora (soft-pure, soft-applied). Word families divided into 4 lists. Flemma lists are also available.
source: https://drive.google.com/file/d/1L7bPGjkuSZ773g3MLGdYyFcYsruuVkM
General Service List
A list of vocabulary families reflecting the 2,000 most frequent words in English and representing an average of “around 82 per cent coverage” of various types of texts (Nation & Waring, 1997, p.15). Used as the basis for many graded readers and other ESL/EFL materials.
source: http://jbauman.com/gsl.html
ICE-CORE list
Approx. 1000 words (lemmas & word families) compiled from 7 varieties of English (Canada, East Africa, Hong Kong, India, Jamaica, Philippines, & Singapore) from the International Corpus of English. Validated using GSL, Nation's BNC, BNC, ELT corpus, sample corpus of 21 English varieties
source: https://www.sequencepublishing.com/cgi-bin/download.cgi?ICECORE
BCN-COCA list
The BNC/COCA word family lists has 29 word family lists. Twenty-five of the lists (each of 1000 words) contain word families based on frequency and range data. The four additional lists are (1) proper names, (2) swear words, exclamations, and letters of the alphabet, (3) transparent compounds, and (4) abbreviations
source: https://www.victoria.ac.nz/lals/about/staff/paul-nation#vocab-lists
New General Service List (NGSL)
2800 high frequency words constructed following GSL methodology from 273 million word section of Cambridge English Corpus. More focused on second language learners and drawn from a more balanced corpus than the new-GSL according to Browne (2014).
source: https://www.newgeneralservicelist.org
Essential Word List for beginners (EWL)
EWL has 800 headwords and lemmas from GSL, BNC, BNC/COCA and New-GSL with greatest coverage in 9 spoken and 9 written corpora. EWL has sublists of lexical words and function words. The lexical words subdivide into 50 item lists.
source: https://www.edu.uwo.ca/faculty-profiles/docs/other/webb/essential-word-list.pdf
Writing for Children High Frequency (CH HF) Wordlist
245 word families from a corpus of imaginative prose of 174 texts totaling 128,540 tokens. A lexical coverage of 3.4% was found using the list. The 245 word families are split into 13 categories - Adjectives, Animal & Plant, Body, Clothing, Colour, Family, Food, House, Roles, School, Storytelling, Verbs, Other
source: https://nflrc.hawaii.edu/rfl/April2019/April2019/articles/macalister.pdf#page=18
Student Engineering English Corpus (SEEC)
1200 word families from a 2 million word corpus of Engineering Mechanics, Engineering Materials, Mechanics of Materials, Mechanics of Fluids, Thermodynamics, Electrical Engineering, Engineering Drawing, Manufacturing Process and Computer Programming. The direct link only lists first 100.
source: http://www.u.arizona.edu/~karaj/pages/Reviews/Mudraya2006.pdf#page=15
Business Word List (1st Edition)
An alphabetized 560-word list of items appearing 10 or more times in five books in Nelson’s (2000) Business English Published Materials Corpus but excluded from the GSL and AWL. Reading focus.
source: https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.121.5133&rep=rep1&type=pdf#page=20
Science Word List
For undergraduates, especially for reading, this 318-word general science list represents 3.79% of the words in a corpus of 1.5 million words of texts from 14 different science subject areas. Further divided into six sublists.
source: https://www.victoria.ac.nz/lals/about/staff/publications/Sci_EAP_sub_lists_Coxhead_and_Hirsh.pdf
Medical Academic Word List (MAWL)
A 623 headword list of high-frequency and wide- coverage words in a 1 million-plus corpus of academic medical research articles representing 32 medical subject areas. Excludes items from the GSL but not those from the AWL (e.g., analyze, concentrate). MAWL appeals most to graduate students pursuing medical or research degrees, for reading and writing.
Newspaper Word List (NWL)
A specialist word list of 588 word families drawn from a newspaper corpus of 579,849 words. Excludes proper names and GSL items that did not have wide range in the corpus. Items are grouped into 10 sublists according to range.
source: https://jalt-publications.org/files/pdf-article/art2_5.pdf
Business Word List (2nd Edition)
A 426-word list of the most frequent items appearing in a very large Business Research Article Corpus at least 270 times, ranked according to range, frequency, and coverage. Excludes words from the British National Corpus below the 3,000 level. Includes 12 mathematics-/ stats-related and 4 computer items, as well as 6 compound words and 4 abbreviations.
Engineering Technical Word List
313 word types of semi-technical words (which have one or more general English language meanings and which in technical contexts take on extended meanings). Notable use of locally produced (Malaysian) engineering textbooks as corpus i.e. non-native users of English
source: http://todayscience.org/IER/article/ier.v1i1p43.pdf#page=13
Computer Science Word List
433 headwords using Coxhead criteria. Has additional multi-word list of 23 items.
source: https://www.baleap.org/wp-content/uploads/2016/03/Daniel-Minshall.pdf
Engineering English Word List
The EEWL includes the 729 most frequent word families beyond the BNC/COCA 2000, appearing in 95 to 100 compulsory textbooks across 20 engineering sub-disciplines at least 288 times in the 4.57-millon-token Engineering Textbook Corpus and making a total of 14.3% lexical coverage.
Medical Academic Vocabulary List
819 lemmas based on 2 medical corpora - a 2.7 million medical academic corpus and a 3.5 million medical textbook series corpus; A letter (“a” for adjective, “n” for noun, “v” for verb and “r” for adverb) is given after each word to indicate the part of speech being referenced. An asterisk is used to mark those words that are also on the new-GSL (2015)
source: https://www.sciencedirect.com/science/article/pii/S1475158516300078
Business English Academic Wordlist
Based on a 16 million business english corpus - KKU-BE (accounting, marketing, advertising, finnance, business law, tourism, economics, and management) the BEAWL contains 415 headwords and 1572 family members. The list was further divided into 7 sub-lists which are arranged first by word representation across subject areas and then by word frequency.
Secondary Vocabulary List
A set of discipline-specific wordlists for secondary school education, the Secondary School Vocabulary Lists (SVL), covers eight core subjects: Biology, Chemistry, Economics, English, Geology, History, Mathematics, and Physics. The SVL also has collocation lists and phrase lists for each subject and a list of core phrases that is shared by all the 8 subjects. The unit of analysis is the lemma though a word-family version is also available.
source: https://www.researchgate.net/publication/334729353_TheSecondaryVocabularyList_SVLxlsx