pt.ul.fc.di.nlx.lxServiceClient
Class LXClient

java.lang.Object
  extended by pt.ul.fc.di.nlx.lxServiceClient.LXClient

public class LXClient
extends java.lang.Object

Client of the LXService, a web service of language technology for Portuguese.

Version:
1.0 (2008-03-07)
Author:
NLX-Natural Language and Speech Group of the University of Lisbon, Department of Informatics

Constructor Summary
LXClient(java.lang.String username)
          Creates an LXClient object.
 
Method Summary
 java.lang.String chunks(java.lang.String text)
          Segments into sentences and paragraphs with LX-Chunker.

Marks sentence boundaries with <s>...</s> and paragraph boundaries with <p>...</p>.
Unwraps sentences split over different lines.

See: accuracy of LX-Chunker.
 java.lang.String posTags(java.lang.String text)
          Segments into sentences and paragraphs with LX-Chunker and into lexemes with LX-Tokenizer, and annotates with POS tags with LX-Tagger.

Assigns a single morpho-syntactic tag, from the tagset below, to every token.
 java.lang.String tokenizes(java.lang.String text)
          Segments into sentences and paragraphs with LX-Chunker and into lexemes with LX-Tokenizer.

Tokenizes text into lexically relevant tokens.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

LXClient

public LXClient(java.lang.String username)
Creates an LXClient object.

Parameters:
username - the username required for authentication before the LXService (registered at LXService database of clients).
Method Detail

chunks

public java.lang.String chunks(java.lang.String text)
                        throws LXException
Segments into sentences and paragraphs with LX-Chunker.

Marks sentence boundaries with <s>...</s> and paragraph boundaries with <p>...</p>.
Unwraps sentences split over different lines.

See: accuracy of LX-Chunker.

Parameters:
text - the text to be segmented: raw text of Portuguese (max size 10K characters).
Returns:
the text segmented.
Throws:
LXException - if an error occurs.

tokenizes

public java.lang.String tokenizes(java.lang.String text)
                           throws LXException
Segments into sentences and paragraphs with LX-Chunker and into lexemes with LX-Tokenizer.


See: accuracy of LX-Chunker and LX-Tokenizer.

Parameters:
text - the text to be segmented: raw text of Portuguese (max size 10K characters).
Returns:
the segmented text.
Throws:
LXException - if an error occurs.

posTags

public java.lang.String posTags(java.lang.String text)
                         throws LXException
Segments into sentences and paragraphs with LX-Chunker and into lexemes with LX-Tokenizer, and annotates with POS tags with LX-Tagger.



Tagset: POS
TagCategoryExamples
ADJAdjectivesbom, brilhante, eficaz, …
ADVAdverbshoje, já, sim, felizmente, …
CARDCardinalszero, dez, cem, mil, …
CJConjunctionse, ou, tal como, …
CLCliticso, lhe, se, …
CNCommon Nounscomputador, cidade, ideia, …
DADefinite Articleso, os, …
DEMDemonstrativeseste, esses, aquele, …
DFRDenominators of Fractionsmeio, terço, décimo, %, …
DGTRRoman NumeralsVI, LX, MMIII, MCMXCIX, …
DGTDigits0, 1, 42, 12345, 67890, …
DMDiscourse Markerolá, …
EADRElectronic Addresseshttp://www.di.fc.ul.pt, …
EOEEnd of Enumerationetc
EXCExclamativeah, ei, etc.
GERGerundssendo, afirmando, vivendo, …
GERAUXGerund "ter"/"haver" in compound tensestendo, havendo …
IAIndefinite Articlesuns, umas, …
INDIndefinitestudo, alguém, ninguém, …
INFInfinitiveser, afirmar, viver, …
INFAUXInfinitive "ter"/"haver" in compound tensester, haver …
INTInterrogativesquem, como, quando, …
ITJInterjectionbolas, caramba, …
LTRLettersa, b, c, …
MGTMagnitude Classesunidade, dezena, dúzia, resma, …
MTHMonthsJaneiro, Dezembro, …
NPNoun Phrasesidem, …
ORDOrdinalsprimeiro, centésimo, penúltimo, …
PADRPart of AddressRua, av., rot., …
PNMPart of NameLisboa, António, João, …
PNTPunctuation Marks., ?, (, …
POSSPossessivesmeu, teu, seu, …
PPAPast Participles not in compound tensessido, afirmados, vivida, …
PPPrepositional Phrasesalgures, …
PPTPast Participle in compound tensessido, afirmado, vivido, …
PREPPrepositionsde, para, em redor de, …
PRSPersonalseu, tu, ele, …
QNTQuantifierstodos, muitos, nenhum, …
RELRelativesque, cujo, tal que, …
STTSocial TitlesPresidente, drª., prof., …
SYBSymbols@, #, &, …
TERMNOptional Terminations(s), (as), …
UM"um" or "uma"um, uma
UNITAbbreviated Measurement Unitskg., km., etc.
VAUXFinite "ter" or "haver" in compound tensestemos, haveriam, …
VVerbs (other than PPA, PPT, INF or GER)falou, falaria, …
WDWeek Dayssegunda, terça-feira, sábado, …
Multi-Word Expressions
LADV1…LADVnMulti-Word Adverbsde facto, em suma, um pouco, …
LCJ1…LCJnMulti-Word Conjunctionsassim como, já que, …
LDEM1…LDEMnMulti-Word Demonstrativeso mesmo, …
LDFR1…LDFRnMulti-Word Denominators of Fractionspor cento
LDM1…LDMnMulti-Word Discourse Markerspois não, até logo, …
LITJ1…LITJnMulti-Word Interjectionsmeu Deus
LPRS1…LPRSnMulti-Word Personalsa gente, si mesmo, V. Exa., …
LPREP1…LPREPnMulti-Word Prepositionsatravés de, a partir de, …
LQD1…LQDnMulti-Word Quantifiersuns quantos, …
LREL1…LRELnMulti-Word Relativestal como, …


Tagset: Other tags
TagDescription
mMasculine
fFeminine
sSingular
pPlural
dimDiminutive
supSuperlative
compComparative
1First Person
2Second Person
3Third Person
piPresente do Indicativo
ppiPretérito Perfeito do Indicativo
iiPretérito Imperfeito do Indicativo
mpiPretérito Mais que Perfeito do Indicativo
fiFuturo do Indicativo
cCondicional
pcPresente do Conjuntivo
icPretérito Imperfeito do Conjuntivo
fcFuturo do Conjuntivo
impImperativo

See: accuracy of LX-Chunker, LX-Tokenizer and LX-Tagger.

Parameters:
text - the text to be POS tagged: raw text of Portuguese (max size 10K characters).
Returns:
the tagged text.
Throws:
LXException - if an error occurs.