linguistic-features-in-text

Logo

LiFT is a library for extracting linguistic features from textual data.

View the Project on GitHub zesch/linguistic-features-in-text

Linguistic Features in Text (LiFT)

LiFT is a library for extracting linguistic features from textual data.

LiFT is currently maintained by:

First steps

See: First Steps with LiFT

Philosophy

We rely on a UIMA CAS repesentation model based on the DKPro Core type system and preprocessing components. This makes LiFT multi-lingual, supporting all the languages included in DKPro Core. However, not all structures might be supported in each language.

LiFT distinguishes betwen linguistic structures (lemmas, POS tags, syllables, spelling errors, etc.) and features (based on these structures). Structures are represented in the document model and can be visualized. Features are numeric values that represent properties of the document, e.g. SpellingErrorRatio may have a value of 0.06 meaning that 6% of all tokens in the text contain a spelling error.

The project is under heavy development, but we are working towards a stable release.

We plan to implement the following types of structures:

We also support various meta-features of linguistic complexity: