API Documentation¶
Getting Started¶
To load UCCA passages from XML files, manipulate them and write to files, use the following code template:
from ucca.ioutil import get_passages_with_progress_bar, write_passage
for passage in get_passages_with_progress_bar(filenames):
...
write_passage(passage)
Each passage instantiates the ucca.core.Passage
class.
XML files can be downloaded from the various UCCA corpora.
ucca.constructions Module¶
Functions¶
add_argument (argparser[, default]) |
|||
create_category_construction (tag) |
|||
create_passage_yields (p, *args[, tags]) |
|
||
diff_terminals (*passages) |
|||
extract_candidates (passage[, constructions, …]) |
Find candidate edges by constructions in UCCA passage. | ||
get_by_name (name) |
|||
get_by_names ([names]) |
|||
positions (terminals) |
|||
terminal_ids (passage) |
|||
verify_terminals_match (passage, reference) |
Classes¶
Candidate (edge[, reference, …]) |
|
Categories () |
|
Construction (name, description, criterion[, …]) |
|
EdgeTags |
Layer 1 Edge tags. |
NodeTags |
Layer 1 Node tags. |
OrderedDict |
Dictionary that remembers insertion order |
chain |
chain(*iterables) –> chain object |
Class Inheritance Diagram¶
ucca.convert Module¶
Converter module between different UCCA annotation formats.
This module contains utilities to convert between UCCA annotation in different
forms, to/from the core
.Passage form, acts as a pivot for all
conversions.
- The possible other formats are:
- site XML standard XML conll (CoNLL-X dependency parsing shared task) sdp (SemEval 2015 semantic dependency parsing shared task)
Functions¶
attach_punct (l0, l1) |
|
file2passage (filename) |
Opens a file and returns its parsed Passage object Tries to read both as a standard XML file and as a binary pickle :param filename: file name to write to |
from_json (lines, *args[, …]) |
Convert text (or dict) in UCCA-App JSON format to a Passage object. |
from_site (elem) |
Converts site XML structure to core .Passage object. |
from_standard (root[, extra_funcs]) |
|
from_text (text[, passage_id, tokenized, …]) |
Converts from tokenized strings to a Passage object. |
get_categories_details (d) |
|
get_json_attrib (d) |
|
join_passages (passages[, passage_id, remarks]) |
Join passages to one passage with all the nodes in order :param passages: sequence of passages to join :param passage_id: ID of newly created passage (otherwise, ID of first passage) :param remarks: add original node ID as remarks to the new nodes :return: joined passage |
passage2file (passage, filename[, indent, binary]) |
Writes a UCCA passage as a standard XML file or a binary pickle :param passage: passage object to write :param filename: file name to write to :param indent: whether to indent each line :param binary: whether to write pickle format (or XML) |
pickle2passage (filename) |
|
split2paragraphs (passage[, remarks, lang, ids]) |
|
split2segments (passage, is_sentences[, …]) |
Split passage to sub-passages :param passage: Passage object :param is_sentences: if True, split to sentences; otherwise, paragraphs :param remarks: Whether to add remarks with original node IDs :param lang: language to use for sentence splitting model :param ids: optional iterable of ids to set passage IDs for each split :return: sequence of passages |
split2sentences (passage[, remarks, lang, ids]) |
|
split_passage (passage, ends[, remarks, ids, …]) |
Split the passage on the given terminal positions :param passage: passage to split :param ends: sequence of positions at which the split passages will end :param remarks: add original node ID as remarks to the new nodes :param ids: optional iterable of ids, the same length as ends, to set passage IDs for each split :param suffix_format: in case ids is None, use this format for the running index suffix :param suffix_start: in case ids is None, use this starting index for the running index suffix :return: sequence of passages |
to_json (passage, *args[, return_dict, …]) |
Convert a Passage object to text (or dict) in UCCA-App JSON :param passage: the Passage object to convert :param return_dict: whether to return dict rather than list of lines :param tok_task: either None (to do tokenization too), or a completed tokenization task dict with token IDs, or True, to indicate that the function should do only tokenization and not annotation :param all_categories: list of category dicts so that IDs can be added, if available - otherwise names are used :param skip_category_mapping: if False, translate edge tag abbreviations to category names; if True, don’t :return: list of lines in JSON format if return_dict=False, or task dict if True |
to_sequence (passage) |
Converts from a Passage object to linearized text sequence. |
to_site (passage) |
Converts a passage to the site XML format. |
to_standard (passage) |
Converts a Passage object to a standard XML root element. |
to_text (passage[, sentences, lang]) |
Converts from a Passage object to tokenized strings. |
xml2passage (filename) |
Classes¶
EdgeTags |
Layer 1 Edge tags. |
JSONDecodeError (msg, doc, pos) |
Subclass of ValueError with the following additional properties: |
SiteCfg |
Contains static configuration for conversion to/from the site XML. |
SiteUtil |
Contains utility functions for converting to/from the site XML. |
SiteXMLUnknownElement |
|
attrgetter |
attrgetter(attr, …) –> attrgetter object |
defaultdict |
defaultdict(default_factory[, …]) –> dict with default factory |
groupby (iterable[, key]) |
keys and groups from the iterable. |
itemgetter |
itemgetter(item, …) –> itemgetter object |
repeat (object [,times]) |
for the specified number of times. |
Class Inheritance Diagram¶
ucca.core Module¶
This module encapsulate the basic elements of the UCCA annotation.
A UCCA annotation is practically a directed acyclic graph (DAG), which
represents a Passage
of text and its annotation. The annotation itself
is divided into Layer
objects, where in each layer Node
objects
are connected between themselves and to Nodes in other layers using
Edge
objects.
Functions¶
edge_id_orderkey (edge) |
Key function which sorts Edges by its IDs (using id_orderkey() ). |
id_orderkey (node) |
Key function which sorts by layer (string), then by unique ID (int). |
Classes¶
Category (tag[, slot, layer, parent]) |
when considering refinement layers, each edge can have multiple tags sorted in a certain hierarchy. |
DuplicateIdError |
Exception raised when trying to add an element with an existing ID. |
Edge (root, parent, child[, tag, attrib]) |
Labeled edge between two Node objects in UCCA annotation graph. |
FrozenPassageError |
Exception raised when trying to modify a frozen Passage . |
Layer (ID, root[, attrib, orderkey]) |
Group of similar Node objects in UCCA annotation graph. |
MissingNodeError |
Exception raised when trying to access a non-existent Node . |
ModifyPassage (fn) |
Decorator for changing a Passage or any member of it. |
Node (ID, root, tag[, attrib, orderkey]) |
Labeled Node in UCCA annotation graph. |
Passage (ID[, attrib]) |
An annotated text with UCCA annotation graph. |
UCCAError |
Base class for all UCCA package exceptions. |
UnimplementedMethodError |
Exception raised when trying to call a not-yet-implemented method. |
Class Inheritance Diagram¶
ucca.diffutil Module¶
Functions¶
diff_passages (true_passage, pred_passage[, …]) |
Debug method to print missing or mistaken attributes, nodes and edges |
passage2file (passage, filename[, indent, binary]) |
Writes a UCCA passage as a standard XML file or a binary pickle :param passage: passage object to write :param filename: file name to write to :param indent: whether to indent each line :param binary: whether to write pickle format (or XML) |
ucca.evaluation Module¶
The evaluation library for UCCA layer 1. v1.4 2016-12-25: move common Fs to root before evaluation 2017-01-04: flatten centers, do not add 1 (for root) to mutual 2017-01-16: fix bug in moving common Fs 2018-04-12: exclude punctuation nodes regardless of edge tag 2018-12-11: fix another bug in moving common Fs 2019-01-22: support multiple categories per edge 2019-11-29: evaluate implicit nodes too (by their parent’s yield)
Functions¶
create_passage_yields (p, *args[, tags]) |
|
||
evaluate (guessed, ref[, converter, verbose, …]) |
Compare two passages and return requested diagnostics and scores, possibly printing them too. | ||
expand_equivalents (tag_set) |
Returns a set of all the tags in the tag set or those equivalent to them :param tag_set: set of tags (strings) to expand | ||
get_by_names ([names]) |
|||
get_text (p, positions) |
|||
get_yield (unit) |
|||
move_functions (p1, p2) |
Move any common Fs to the root | ||
print_tags_and_text (p, yield_tags) |
Classes¶
Counter (**kwds) |
Dict subclass for counting hashable items. |
EdgeTags |
Layer 1 Edge tags. |
Evaluator (verbose, constructions, units, …) |
|
EvaluatorResults (results[, default]) |
|
NodeTags |
Layer 1 Node tags. |
OrderedDict |
Dictionary that remembers insertion order |
Scores (evaluator_results[, name, …]) |
|
SummaryStatistics (num_matches, …[, errors]) |
|
attrgetter |
attrgetter(attr, …) –> attrgetter object |
groupby (iterable[, key]) |
keys and groups from the iterable. |
Class Inheritance Diagram¶
ucca.ioutil Module¶
Input/output utility functions for UCCA scripts.
Functions¶
contextmanager (func) |
@contextmanager decorator. | ||||
external_write_mode (*args, **kwargs) |
|||||
file2passage (filename) |
Opens a file and returns its parsed Passage object Tries to read both as a standard XML file and as a binary pickle :param filename: file name to write to | ||||
from_text (text[, passage_id, tokenized, …]) |
Converts from tokenized strings to a Passage object. | ||||
gen_files (files_and_dirs) |
|
||||
get_passages (filename_patterns, **kwargs) |
|||||
get_passages_with_progress_bar (filename_patterns) |
|||||
glob (pathname, *[, recursive]) |
Return a list of paths matching a pathname pattern. | ||||
passage2file (passage, filename[, indent, binary]) |
Writes a UCCA passage as a standard XML file or a binary pickle :param passage: passage object to write :param filename: file name to write to :param indent: whether to indent each line :param binary: whether to write pickle format (or XML) | ||||
read_files_and_dirs (files_and_dirs[, …]) |
|
||||
resolve_patterns (filename_patterns) |
|||||
split2segments (passage, is_sentences[, …]) |
Split passage to sub-passages :param passage: Passage object :param is_sentences: if True, split to sentences; otherwise, paragraphs :param remarks: Whether to add remarks with original node IDs :param lang: language to use for sentence splitting model :param ids: optional iterable of ids to set passage IDs for each split :return: sequence of passages | ||||
to_text (passage[, sentences, lang]) |
Converts from a Passage object to tokenized strings. | ||||
write_passage (passage[, output_format, …]) |
Write a given UCCA passage in any format. |
Classes¶
LazyLoadedPassages (files[, sentences, …]) |
Iterable interface to Passage objects that loads files on-the-go and can be iterated more than once |
ParseError |
|
Passage (ID[, attrib]) |
An annotated text with UCCA annotation graph. |
chain |
chain(*iterables) –> chain object |
defaultdict |
defaultdict(default_factory[, …]) –> dict with default factory |
filterfalse |
filterfalse(function or None, sequence) –> filterfalse object |
tqdm ([iterable, desc, total, leave, file, …]) |
Decorate an iterable object, returning an iterator which acts exactly like the original iterable, but prints a dynamically updating progressbar every time a value is requested. |
Class Inheritance Diagram¶
ucca.layer0 Module¶
Encapsulates all word and punctuation symbols layer.
Layer 0 is the basic layer for all the UCCA annotation, as it includes the
actual words and punctuation marks found in the core
.Passage.
Layer 0 has only one type of node, Terminal
. This is a subtype of
core
.Node, and can have one of two tags: Word or Punctuation.
Classes¶
Layer0 (root[, attrib]) |
Represents the Terminal objects layer. |
NodeTags |
|
Terminal (ID, root, tag[, attrib, orderkey]) |
Layer 0 Node type, represents a word or a punctuation mark. |
Class Inheritance Diagram¶
ucca.layer1 Module¶
Describes the foundational level elements (layer 1) of the UCCA annotation.
Layer 1 is the foundational layer of UCCA, whose Nodes and Edges represent scene objects and relations. The basic building blocks of this layer are the FNode, which is a participant in a scene relation (including the relation itself), and the various Edges between these Nodes, which represent the type of relation between the Nodes.
Classes¶
EdgeTags |
Layer 1 Edge tags. |
FoundationalNode (ID, root, tag[, attrib, …]) |
The basic building block of UCCA annotation, represents semantic units. |
Layer1 (root[, attrib, orderkey]) |
|
Linkage (ID, root, tag[, attrib, orderkey]) |
A Linkage between parallel scenes. |
MissingRelationError |
Exception raised when a required edge is not present. |
NodeTags |
Layer 1 Node tags. |
PunctNode (ID, root, tag[, attrib, orderkey]) |
Encapsulates punctuation layer0 .Terminal objects. |
Class Inheritance Diagram¶
ucca.normalization Module¶
Functions¶
attach_punct (l0, l1) |
|
attach_terminals (l0, l1) |
|
copy_edge (edge[, parent, child, tag, attrib]) |
|
destroy (node_or_edge) |
|
detach_punct (l1) |
|
flatten_centers (node) |
Whenever there are Cs inside Cs, remove the external C. |
flatten_functions (node) |
Whenever there is an F as an only child, remove it. |
flatten_participants (node) |
Whenever there is an A as an only child, remove it. |
fparent (node_or_edge) |
|
lowest_common_ancestor (*nodes) |
|
move_elements (node, tags, parent_tags[, forward]) |
|
move_scene_elements (node) |
|
move_sub_scene_elements (node) |
|
nearest_parent (l0, *terminals) |
|
nearest_word (l0, position, step) |
|
normalize (passage[, extra]) |
|
normalize_node (node, l1, extra) |
|
reattach_punct (l0, l1) |
|
reattach_terminals (l0, l1) |
|
remove (parent, child) |
|
remove_unmarked_implicits (node) |
|
replace_center (edge) |
|
replace_edge_tags (node) |
|
separate_scenes (node, l1[, top_level]) |
|
split_coordinated_main_rel (node, l1) |
|
traverse_up_centers (node) |
ucca.textutil Module¶
Utility functions for UCCA package.
Functions¶
annotate (passage, *args, **kwargs) |
Run spaCy pipeline on the given passage, unless already annotated :param passage: Passage object, whose layer 0 nodes will be added entries in the `extra’ dict |
annotate_all (passages[, replace, as_array, …]) |
Run spaCy pipeline on the given passages, unless already annotated :param passages: iterable of Passage objects, whose layer 0 nodes will be added entries in the `extra’ dict :param replace: even if a given passage is already annotated, replace with new annotation :param as_array: instead of adding `extra’ entries to each terminal, set layer 0 extra[“doc”] to array of ids :param as_extra: set `extra’ entries to each terminal :param as_tuples: treat input as tuples of (passage text, context), and return context for each passage as-is :param lang: optional two-letter language code, will be overridden if passage has “lang” attrib :param vocab: optional dictionary of vocabulary IDs to string values, to avoid loading spaCy model :param verbose: whether to print annotated text :return: generator of annotated passages, which are actually modified in-place (same objects as input) |
annotate_as_tuples (passages[, replace, …]) |
|
break2paragraphs (passage[, return_terminals]) |
Breaks into paragraphs according to the annotation. |
break2sentences (passage[, lang]) |
Breaks paragraphs into sentences according to the annotation. |
contextmanager (func) |
@contextmanager decorator. |
external_write_mode (*args, **kwargs) |
|
extract_terminals (p) |
returns an iterator of the terminals of the passage p |
get_lang (passage_context) |
|
get_nlp ([lang]) |
Load spaCy model for a given language, determined by `models’ dict or by MODEL_ENV_VAR |
get_tokenizer ([tokenized, lang]) |
|
get_vocab ([vocab, lang]) |
|
get_word_vectors ([dim, size, filename, vocab]) |
Get word vectors from spaCy model or from text file :param dim: dimension to trim vectors to (default: keep original) :param size: maximum number of vectors to load (default: all) :param filename: text file to load vectors from (default: from spaCy model) :param vocab: instead of strings, look up keys of returned dict in vocab (use lang str, e.g. |
indent_xml (xml_as_string) |
Indents a string of XML-like objects. |
is_annotated (passage[, as_array, as_extra]) |
Whether the passage is already annotated or only partially annotated |
load_spacy_model (model) |
|
read_word_vectors (dim, size, filename) |
Read word vectors from text file, with an optional first row indicating size and dimension :param dim: dimension to trim vectors to :param size: maximum number of vectors to load :param filename: text file to load vectors from :return: generator: first element is (#vectors, #dims); and all the rest are (word [string], vector [NumPy array]) |
set_docs (annotated, as_array, as_extra, …) |
Given spaCy annotations, set values in layer0.extra per paragraph if as_array=True, and in Terminal.extra if as_extra=True |
to_annotate (passage_contexts, replace[, …]) |
Filter passages to get only those that require annotation; split to paragraphs and return generator of (list of tokens, (paragraph index, list of Terminals, Passage) + original context appended) tuples |
Classes¶
Attr |
Wrapper for spaCy Attr, determining order for saving in layer0.extra per token when as_array=True |
Enum |
Generic enumeration. |
OrderedDict |
Dictionary that remembers insertion order |
attrgetter |
attrgetter(attr, …) –> attrgetter object |
deque |
deque([iterable[, maxlen]]) –> deque object |
groupby (iterable[, key]) |
keys and groups from the iterable. |
islice |
islice(iterable, stop) –> islice object islice(iterable, start, stop[, step]) –> islice object |
itemgetter |
itemgetter(item, …) –> itemgetter object |
tqdm ([iterable, desc, total, leave, file, …]) |
Decorate an iterable object, returning an iterator which acts exactly like the original iterable, but prints a dynamically updating progressbar every time a value is requested. |
Class Inheritance Diagram¶
ucca.validation Module¶
Functions¶
join (items) |
|
tag_to_edge (edges) |
|
validate (passage[, linkage, multigraph]) |
Classes¶
ETags |
alias of ucca.layer1.EdgeTags |
L0Tags |
alias of ucca.layer0.NodeTags |
L1Tags |
alias of ucca.layer1.NodeTags |
NodeValidator (node) |
|
attrgetter |
attrgetter(attr, …) –> attrgetter object |
groupby (iterable[, key]) |
keys and groups from the iterable. |
Class Inheritance Diagram¶
ucca.visualization Module¶
Functions¶
draw (passage[, node_ids]) |
|||
node_label (node) |
|||
standoff (p) |
Visualize to Standoff .ann format, which can be presented with brat :param p: Passage :return: string in Standoff format | ||
tex_escape (text) |
|
||
tikz (p[, indent, node_ids]) |
Visualize to TikZ format :param p: Passage :param indent: indentation size or None for no indentation :param node_ids: whether to include node IDs :return: string in TikZ format | ||
topological_layout (passage) |