Spaces:

mohdelgaar
/

LingConv

Sleeping

App Files Files Community

LingConv / lng /lca /README.txt

mohdelgaar

upload lng

b028d48 3 months ago

raw

history blame

3.33 kB

	This code is the lexical complexity analyzer described in

	Lu, Xiaofei (2012). The relationship of lexical richnes to the quality
	of ESL speakers' oral narratives. The Modern Language Journal, 96(2), 190-208.

	Version 1.1 Released on February 12, 2013

	Copyright (C) 2013 Xiaofei Lu

	This program is free software; you can redistribute it and/or modify it under
	the terms of the GNU General Public License as published by the Free Software
	Foundation; either version 2 of the License, or (at your option) any later
	version.

	This program is distributed in the hope that it will be useful, but WITHOUT
	ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
	FOR A PARTICULAR PURPOSE. See the GNU General Public License for more
	details.

	You should have received a copy of the GNU General Public License along with
	this program; if not, write to the Free Software Foundation, Inc., 59 Temple
	Place, Suite 330, Boston, MA 02111-1307 USA

	To download the latest version of this software, follow the appropriate link
	at
	http://www.personal.psu.edu/xxl13/download.html


	1. About

	This tool computes the lexical complexity of English texts using 25 different
	measures. Information on the measures can be found in Lu (2012). This
	tool uses frequency lists derived from the British National Corpus and the
	American National Corpus.

	2. Running the tool

	2.1 Input files: All input files must be POS-tagged and lemmatized first and
	must be in the following format (see files in the samples folder for
	examples). The file should contain a minumum of 50 words.

	lemma_pos lemma_pos lemma_pos ...

	or

	lemma_pos
	lemma_pos
	lemma_pos

	You can use any POS tagger and lemmatizer, as long as the Penn Treebank POS
	tagset is adopted and the input file is appropriately formatted. In Lu
	(2012), the following POS tagger and lemmaitzer were used:

	The Stanford POS tagger:
	http://nlp.stanford.edu/software/tagger.shtml

	MORPHA:
	http://www.informatics.susx.ac.uk/research/groups/nlp/carroll/morph.html

	2.2 Analyzing a single file: To get the lexical complexity of a single file,
	run the following from this directory. Replace input_file with the actual
	name of your input file and output_file with the desired name of your output
	file.

	python lc.py input_file > output_file

	e.g.,

	python lc.py samples/1.lem > 1.lex

	To use the American National Corpus (ANC) wordlist instead of the BNC wordlist
	for lexical sophistication analysis, use the lc-anc.py script, e.g.,

	python lc-anc.py samples/1.lem > 1-anc.lex

	2.3 Analyzing multiple files: To get the lexical complexity of two or more
	files within a single folder, run the following from this directory. Replace
	path_to_folder with the actual path to the folder that contains your files
	and output_file with the desired name of your output file. The folder should
	only contain the files you want to analyze.

	python folder-lc.py path_to_folder > output_file

	e.g.,

	python folder-lc.py samples/ > samples.lex

	To use the American National Corpus (ANC) wordlist instead of the BNC wordlist
	for lexical sophistication analysis, use the folder-lc-anc.py script, e.g.,

	python folder-lc-anc.py samples/ > samples-anc.lex

	2.4 Using the output: The output file is comma-delimited and can be loaded to
	excel and spss directly for analysis.