API Documentation

Getting Started

To load UCCA passages from XML files, manipulate them and write to files, use the following code template:

from ucca.ioutil import get_passages_with_progress_bar, write_passage
for passage in get_passages_with_progress_bar(filenames):
    ...
    write_passage(passage)

Each passage instantiates the ucca.core.Passage class.

XML files can be downloaded from the various UCCA corpora.

ucca.constructions Module

Functions

add_argument(argparser[, default])
create_category_construction(tag)
create_passage_yields(p, *args[, tags])
param p:passage to find terminal yields of
diff_terminals(*passages)
extract_candidates(passage[, constructions, …]) Find candidate edges by constructions in UCCA passage.
get_by_name(name)
get_by_names([names])
positions(terminals)
terminal_ids(passage)
verify_terminals_match(passage, reference)

Classes

Candidate(edge[, reference, …])
Categories()
Construction(name, description, criterion[, …])
EdgeTags Layer 1 Edge tags.
NodeTags Layer 1 Node tags.
OrderedDict Dictionary that remembers insertion order
chain chain(*iterables) –> chain object

Class Inheritance Diagram

Inheritance diagram of ucca.constructions.Candidate, ucca.constructions.Categories, ucca.constructions.Construction

ucca.convert Module

Converter module between different UCCA annotation formats.

This module contains utilities to convert between UCCA annotation in different forms, to/from the core.Passage form, acts as a pivot for all conversions.

The possible other formats are:
site XML standard XML conll (CoNLL-X dependency parsing shared task) sdp (SemEval 2015 semantic dependency parsing shared task)

Functions

attach_punct(l0, l1)
file2passage(filename) Opens a file and returns its parsed Passage object Tries to read both as a standard XML file and as a binary pickle :param filename: file name to write to
from_json(lines, *args[, …]) Convert text (or dict) in UCCA-App JSON format to a Passage object.
from_site(elem) Converts site XML structure to core.Passage object.
from_standard(root[, extra_funcs])
from_text(text[, passage_id, tokenized, …]) Converts from tokenized strings to a Passage object.
get_categories_details(d)
get_json_attrib(d)
join_passages(passages[, passage_id, remarks]) Join passages to one passage with all the nodes in order :param passages: sequence of passages to join :param passage_id: ID of newly created passage (otherwise, ID of first passage) :param remarks: add original node ID as remarks to the new nodes :return: joined passage
passage2file(passage, filename[, indent, binary]) Writes a UCCA passage as a standard XML file or a binary pickle :param passage: passage object to write :param filename: file name to write to :param indent: whether to indent each line :param binary: whether to write pickle format (or XML)
pickle2passage(filename)
split2paragraphs(passage[, remarks, lang, ids])
split2segments(passage, is_sentences[, …]) Split passage to sub-passages :param passage: Passage object :param is_sentences: if True, split to sentences; otherwise, paragraphs :param remarks: Whether to add remarks with original node IDs :param lang: language to use for sentence splitting model :param ids: optional iterable of ids to set passage IDs for each split :return: sequence of passages
split2sentences(passage[, remarks, lang, ids])
split_passage(passage, ends[, remarks, ids, …]) Split the passage on the given terminal positions :param passage: passage to split :param ends: sequence of positions at which the split passages will end :param remarks: add original node ID as remarks to the new nodes :param ids: optional iterable of ids, the same length as ends, to set passage IDs for each split :param suffix_format: in case ids is None, use this format for the running index suffix :param suffix_start: in case ids is None, use this starting index for the running index suffix :return: sequence of passages
to_json(passage, *args[, return_dict, …]) Convert a Passage object to text (or dict) in UCCA-App JSON :param passage: the Passage object to convert :param return_dict: whether to return dict rather than list of lines :param tok_task: either None (to do tokenization too), or a completed tokenization task dict with token IDs, or True, to indicate that the function should do only tokenization and not annotation :param all_categories: list of category dicts so that IDs can be added, if available - otherwise names are used :param skip_category_mapping: if False, translate edge tag abbreviations to category names; if True, don’t :return: list of lines in JSON format if return_dict=False, or task dict if True
to_sequence(passage) Converts from a Passage object to linearized text sequence.
to_site(passage) Converts a passage to the site XML format.
to_standard(passage) Converts a Passage object to a standard XML root element.
to_text(passage[, sentences, lang]) Converts from a Passage object to tokenized strings.
xml2passage(filename)

Classes

EdgeTags Layer 1 Edge tags.
JSONDecodeError(msg, doc, pos) Subclass of ValueError with the following additional properties:
SiteCfg Contains static configuration for conversion to/from the site XML.
SiteUtil Contains utility functions for converting to/from the site XML.
SiteXMLUnknownElement
attrgetter attrgetter(attr, …) –> attrgetter object
defaultdict defaultdict(default_factory[, …]) –> dict with default factory
groupby(iterable[, key]) keys and groups from the iterable.
itemgetter itemgetter(item, …) –> itemgetter object
repeat(object [,times]) for the specified number of times.

Class Inheritance Diagram

Inheritance diagram of ucca.convert.SiteCfg, ucca.convert.SiteUtil, ucca.convert.SiteXMLUnknownElement

ucca.core Module

This module encapsulate the basic elements of the UCCA annotation.

A UCCA annotation is practically a directed acyclic graph (DAG), which represents a Passage of text and its annotation. The annotation itself is divided into Layer objects, where in each layer Node objects are connected between themselves and to Nodes in other layers using Edge objects.

Functions

edge_id_orderkey(edge) Key function which sorts Edges by its IDs (using id_orderkey()).
id_orderkey(node) Key function which sorts by layer (string), then by unique ID (int).

Classes

Category(tag[, slot, layer, parent]) when considering refinement layers, each edge can have multiple tags sorted in a certain hierarchy.
DuplicateIdError Exception raised when trying to add an element with an existing ID.
Edge(root, parent, child[, tag, attrib]) Labeled edge between two Node objects in UCCA annotation graph.
FrozenPassageError Exception raised when trying to modify a frozen Passage.
Layer(ID, root[, attrib, orderkey]) Group of similar Node objects in UCCA annotation graph.
MissingNodeError Exception raised when trying to access a non-existent Node.
ModifyPassage(fn) Decorator for changing a Passage or any member of it.
Node(ID, root, tag[, attrib, orderkey]) Labeled Node in UCCA annotation graph.
Passage(ID[, attrib]) An annotated text with UCCA annotation graph.
UCCAError Base class for all UCCA package exceptions.
UnimplementedMethodError Exception raised when trying to call a not-yet-implemented method.

Class Inheritance Diagram

Inheritance diagram of ucca.core.Category, ucca.core.DuplicateIdError, ucca.core.Edge, ucca.core.FrozenPassageError, ucca.core.Layer, ucca.core.MissingNodeError, ucca.core.ModifyPassage, ucca.core.Node, ucca.core.Passage, ucca.core.UCCAError, ucca.core.UnimplementedMethodError

ucca.diffutil Module

Functions

diff_passages(true_passage, pred_passage[, …]) Debug method to print missing or mistaken attributes, nodes and edges
passage2file(passage, filename[, indent, binary]) Writes a UCCA passage as a standard XML file or a binary pickle :param passage: passage object to write :param filename: file name to write to :param indent: whether to indent each line :param binary: whether to write pickle format (or XML)

ucca.evaluation Module

The evaluation library for UCCA layer 1. v1.4 2016-12-25: move common Fs to root before evaluation 2017-01-04: flatten centers, do not add 1 (for root) to mutual 2017-01-16: fix bug in moving common Fs 2018-04-12: exclude punctuation nodes regardless of edge tag 2018-12-11: fix another bug in moving common Fs 2019-01-22: support multiple categories per edge 2019-11-29: evaluate implicit nodes too (by their parent’s yield)

Functions

create_passage_yields(p, *args[, tags])
param p:passage to find terminal yields of
evaluate(guessed, ref[, converter, verbose, …]) Compare two passages and return requested diagnostics and scores, possibly printing them too.
expand_equivalents(tag_set) Returns a set of all the tags in the tag set or those equivalent to them :param tag_set: set of tags (strings) to expand
get_by_names([names])
get_text(p, positions)
get_yield(unit)
move_functions(p1, p2) Move any common Fs to the root
print_tags_and_text(p, yield_tags)

Classes

Counter(**kwds) Dict subclass for counting hashable items.
EdgeTags Layer 1 Edge tags.
Evaluator(verbose, constructions, units, …)
EvaluatorResults(results[, default])
NodeTags Layer 1 Node tags.
OrderedDict Dictionary that remembers insertion order
Scores(evaluator_results[, name, …])
SummaryStatistics(num_matches, …[, errors])
attrgetter attrgetter(attr, …) –> attrgetter object
groupby(iterable[, key]) keys and groups from the iterable.

Class Inheritance Diagram

Inheritance diagram of ucca.evaluation.Evaluator, ucca.evaluation.EvaluatorResults, ucca.evaluation.Scores, ucca.evaluation.SummaryStatistics

ucca.ioutil Module

Input/output utility functions for UCCA scripts.

Functions

contextmanager(func) @contextmanager decorator.
external_write_mode(*args, **kwargs)
file2passage(filename) Opens a file and returns its parsed Passage object Tries to read both as a standard XML file and as a binary pickle :param filename: file name to write to
from_text(text[, passage_id, tokenized, …]) Converts from tokenized strings to a Passage object.
gen_files(files_and_dirs)
param files_and_dirs:
 iterable of files and/or directories to look in
get_passages(filename_patterns, **kwargs)
get_passages_with_progress_bar(filename_patterns)
glob(pathname, *[, recursive]) Return a list of paths matching a pathname pattern.
passage2file(passage, filename[, indent, binary]) Writes a UCCA passage as a standard XML file or a binary pickle :param passage: passage object to write :param filename: file name to write to :param indent: whether to indent each line :param binary: whether to write pickle format (or XML)
read_files_and_dirs(files_and_dirs[, …])
param files_and_dirs:
 iterable of files and/or directories to look in
resolve_patterns(filename_patterns)
split2segments(passage, is_sentences[, …]) Split passage to sub-passages :param passage: Passage object :param is_sentences: if True, split to sentences; otherwise, paragraphs :param remarks: Whether to add remarks with original node IDs :param lang: language to use for sentence splitting model :param ids: optional iterable of ids to set passage IDs for each split :return: sequence of passages
to_text(passage[, sentences, lang]) Converts from a Passage object to tokenized strings.
write_passage(passage[, output_format, …]) Write a given UCCA passage in any format.

Classes

LazyLoadedPassages(files[, sentences, …]) Iterable interface to Passage objects that loads files on-the-go and can be iterated more than once
ParseError
Passage(ID[, attrib]) An annotated text with UCCA annotation graph.
chain chain(*iterables) –> chain object
defaultdict defaultdict(default_factory[, …]) –> dict with default factory
filterfalse filterfalse(function or None, sequence) –> filterfalse object
tqdm([iterable, desc, total, leave, file, …]) Decorate an iterable object, returning an iterator which acts exactly like the original iterable, but prints a dynamically updating progressbar every time a value is requested.

Class Inheritance Diagram

Inheritance diagram of ucca.ioutil.LazyLoadedPassages

ucca.layer0 Module

Encapsulates all word and punctuation symbols layer.

Layer 0 is the basic layer for all the UCCA annotation, as it includes the actual words and punctuation marks found in the core.Passage.

Layer 0 has only one type of node, Terminal. This is a subtype of core.Node, and can have one of two tags: Word or Punctuation.

Functions

is_punct(node) Returns whether the unit is a layer0 punctuation (for all Units).

Classes

Layer0(root[, attrib]) Represents the Terminal objects layer.
NodeTags
Terminal(ID, root, tag[, attrib, orderkey]) Layer 0 Node type, represents a word or a punctuation mark.

Class Inheritance Diagram

Inheritance diagram of ucca.layer0.Layer0, ucca.layer0.NodeTags, ucca.layer0.Terminal

ucca.layer1 Module

Describes the foundational level elements (layer 1) of the UCCA annotation.

Layer 1 is the foundational layer of UCCA, whose Nodes and Edges represent scene objects and relations. The basic building blocks of this layer are the FNode, which is a participant in a scene relation (including the relation itself), and the various Edges between these Nodes, which represent the type of relation between the Nodes.

Classes

EdgeTags Layer 1 Edge tags.
FoundationalNode(ID, root, tag[, attrib, …]) The basic building block of UCCA annotation, represents semantic units.
Layer1(root[, attrib, orderkey])
Linkage(ID, root, tag[, attrib, orderkey]) A Linkage between parallel scenes.
MissingRelationError Exception raised when a required edge is not present.
NodeTags Layer 1 Node tags.
PunctNode(ID, root, tag[, attrib, orderkey]) Encapsulates punctuation layer0.Terminal objects.

Class Inheritance Diagram

Inheritance diagram of ucca.layer1.EdgeTags, ucca.layer1.FoundationalNode, ucca.layer1.Layer1, ucca.layer1.Linkage, ucca.layer1.MissingRelationError, ucca.layer1.NodeTags, ucca.layer1.PunctNode

ucca.normalization Module

Functions

attach_punct(l0, l1)
attach_terminals(l0, l1)
copy_edge(edge[, parent, child, tag, attrib])
destroy(node_or_edge)
detach_punct(l1)
flatten_centers(node) Whenever there are Cs inside Cs, remove the external C.
flatten_functions(node) Whenever there is an F as an only child, remove it.
flatten_participants(node) Whenever there is an A as an only child, remove it.
fparent(node_or_edge)
lowest_common_ancestor(*nodes)
move_elements(node, tags, parent_tags[, forward])
move_scene_elements(node)
move_sub_scene_elements(node)
nearest_parent(l0, *terminals)
nearest_word(l0, position, step)
normalize(passage[, extra])
normalize_node(node, l1, extra)
reattach_punct(l0, l1)
reattach_terminals(l0, l1)
remove(parent, child)
remove_unmarked_implicits(node)
replace_center(edge)
replace_edge_tags(node)
separate_scenes(node, l1[, top_level])
split_coordinated_main_rel(node, l1)
traverse_up_centers(node)

ucca.textutil Module

Utility functions for UCCA package.

Functions

annotate(passage, *args, **kwargs) Run spaCy pipeline on the given passage, unless already annotated :param passage: Passage object, whose layer 0 nodes will be added entries in the `extra’ dict
annotate_all(passages[, replace, as_array, …]) Run spaCy pipeline on the given passages, unless already annotated :param passages: iterable of Passage objects, whose layer 0 nodes will be added entries in the `extra’ dict :param replace: even if a given passage is already annotated, replace with new annotation :param as_array: instead of adding `extra’ entries to each terminal, set layer 0 extra[“doc”] to array of ids :param as_extra: set `extra’ entries to each terminal :param as_tuples: treat input as tuples of (passage text, context), and return context for each passage as-is :param lang: optional two-letter language code, will be overridden if passage has “lang” attrib :param vocab: optional dictionary of vocabulary IDs to string values, to avoid loading spaCy model :param verbose: whether to print annotated text :return: generator of annotated passages, which are actually modified in-place (same objects as input)
annotate_as_tuples(passages[, replace, …])
break2paragraphs(passage[, return_terminals]) Breaks into paragraphs according to the annotation.
break2sentences(passage[, lang]) Breaks paragraphs into sentences according to the annotation.
contextmanager(func) @contextmanager decorator.
external_write_mode(*args, **kwargs)
extract_terminals(p) returns an iterator of the terminals of the passage p
get_lang(passage_context)
get_nlp([lang]) Load spaCy model for a given language, determined by `models’ dict or by MODEL_ENV_VAR
get_tokenizer([tokenized, lang])
get_vocab([vocab, lang])
get_word_vectors([dim, size, filename, vocab]) Get word vectors from spaCy model or from text file :param dim: dimension to trim vectors to (default: keep original) :param size: maximum number of vectors to load (default: all) :param filename: text file to load vectors from (default: from spaCy model) :param vocab: instead of strings, look up keys of returned dict in vocab (use lang str, e.g.
indent_xml(xml_as_string) Indents a string of XML-like objects.
is_annotated(passage[, as_array, as_extra]) Whether the passage is already annotated or only partially annotated
load_spacy_model(model)
read_word_vectors(dim, size, filename) Read word vectors from text file, with an optional first row indicating size and dimension :param dim: dimension to trim vectors to :param size: maximum number of vectors to load :param filename: text file to load vectors from :return: generator: first element is (#vectors, #dims); and all the rest are (word [string], vector [NumPy array])
set_docs(annotated, as_array, as_extra, …) Given spaCy annotations, set values in layer0.extra per paragraph if as_array=True, and in Terminal.extra if as_extra=True
to_annotate(passage_contexts, replace[, …]) Filter passages to get only those that require annotation; split to paragraphs and return generator of (list of tokens, (paragraph index, list of Terminals, Passage) + original context appended) tuples

Classes

Attr Wrapper for spaCy Attr, determining order for saving in layer0.extra per token when as_array=True
Enum Generic enumeration.
OrderedDict Dictionary that remembers insertion order
attrgetter attrgetter(attr, …) –> attrgetter object
deque deque([iterable[, maxlen]]) –> deque object
groupby(iterable[, key]) keys and groups from the iterable.
islice islice(iterable, stop) –> islice object islice(iterable, start, stop[, step]) –> islice object
itemgetter itemgetter(item, …) –> itemgetter object
tqdm([iterable, desc, total, leave, file, …]) Decorate an iterable object, returning an iterator which acts exactly like the original iterable, but prints a dynamically updating progressbar every time a value is requested.

Class Inheritance Diagram

Inheritance diagram of ucca.textutil.Attr

ucca.validation Module

Functions

join(items)
tag_to_edge(edges)
validate(passage[, linkage, multigraph])

Classes

ETags alias of ucca.layer1.EdgeTags
L0Tags alias of ucca.layer0.NodeTags
L1Tags alias of ucca.layer1.NodeTags
NodeValidator(node)
attrgetter attrgetter(attr, …) –> attrgetter object
groupby(iterable[, key]) keys and groups from the iterable.

Class Inheritance Diagram

Inheritance diagram of ucca.validation.NodeValidator

ucca.visualization Module

Functions

draw(passage[, node_ids])
node_label(node)
standoff(p) Visualize to Standoff .ann format, which can be presented with brat :param p: Passage :return: string in Standoff format
tex_escape(text)
param text:a plain text message
tikz(p[, indent, node_ids]) Visualize to TikZ format :param p: Passage :param indent: indentation size or None for no indentation :param node_ids: whether to include node IDs :return: string in TikZ format
topological_layout(passage)