from_text

ucca.convert.from_text(text, passage_id='1', tokenized=False, one_per_line=False, extra_format=None, lang='en', return_text=False, *args, **kwargs)[source]

Converts from tokenized strings to a Passage object.

Parameters:
  • text – a multi-line string or a sequence of strings: each line will be a new paragraph, and blank lines separate passages
  • passage_id – prefix of ID to set for returned passages
  • tokenized – whether the text is already given as a list of tokens
  • one_per_line – each line will be a new passage rather than just a new paragraph
  • extra_format – value to set in passage.extra[“format”]
  • lang – language to use for tokenization model
  • return_text – whether to return the original text with each passage and not just the passage itself
Returns:

generator of Passage object with only Terminal units