Scripts Documentation

scripts.annotate Module

Functions

annotate_all(passages[, replace, as_array, …]) Run spaCy pipeline on the given passages, unless already annotated :param passages: iterable of Passage objects, whose layer 0 nodes will be added entries in the `extra’ dict :param replace: even if a given passage is already annotated, replace with new annotation :param as_array: instead of adding `extra’ entries to each terminal, set layer 0 extra[“doc”] to array of ids :param as_extra: set `extra’ entries to each terminal :param as_tuples: treat input as tuples of (passage text, context), and return context for each passage as-is :param lang: optional two-letter language code, will be overridden if passage has “lang” attrib :param vocab: optional dictionary of vocabulary IDs to string values, to avoid loading spaCy model :param verbose: whether to print annotated text :return: generator of annotated passages, which are actually modified in-place (same objects as input)
get_passages_with_progress_bar(filename_patterns)
is_annotated(passage[, as_array, as_extra]) Whether the passage is already annotated or only partially annotated
main(args)
write_passage(passage[, output_format, …]) Write a given UCCA passage in any format.

scripts.convert_1_0_to_1_2 Module

Functions

annotate_all(passages[, replace, as_array, …]) Run spaCy pipeline on the given passages, unless already annotated :param passages: iterable of Passage objects, whose layer 0 nodes will be added entries in the `extra’ dict :param replace: even if a given passage is already annotated, replace with new annotation :param as_array: instead of adding `extra’ entries to each terminal, set layer 0 extra[“doc”] to array of ids :param as_extra: set `extra’ entries to each terminal :param as_tuples: treat input as tuples of (passage text, context), and return context for each passage as-is :param lang: optional two-letter language code, will be overridden if passage has “lang” attrib :param vocab: optional dictionary of vocabulary IDs to string values, to avoid loading spaCy model :param verbose: whether to print annotated text :return: generator of annotated passages, which are actually modified in-place (same objects as input)
convert_passage(passage, report_writer)
copy_edge(edge[, parent, child, tag, attrib])
destroy(node_or_edge)
extract_aux(terminal, parent, grandparent)
extract_ground(terminal, parent, grandparent)
extract_modal(terminal, parent, grandparent)
extract_relator(terminal, parent, grandparent)
extract_that(terminal, parent, grandparent)
fix_punct(terminal, parent, grandparent)
fix_root_terminal_child(terminal, parent, …)
fix_unary_participant(terminal, parent, …)
flag_relator_starts_main_relation(terminal, …)
flag_suspected_secondary(terminal, parent, …)
fparent(node_or_edge)
get_annotation(terminal, attr)
get_passages_with_progress_bar(filename_patterns)
is_main_relation(node)
main(args)
move_node(node, new_parent[, tag])
remove(parent, child)
set_light_verb_function(terminal, parent, …)
write_passage(passage[, output_format, …]) Write a given UCCA passage in any format.

scripts.convert_2_0_to_1_2 Module

Functions

convert_passage(passage, report_writer)
copy_edge(edge[, parent, child, tag, attrib])
destroy(node_or_edge)
get_passages_with_progress_bar(filename_patterns)
main(args)
replace_time_and_quantifier(edge)
write_passage(passage[, output_format, …]) Write a given UCCA passage in any format.

scripts.count_parents_children Module

Functions

clip(l, m)
get_passages_with_progress_bar(filename_patterns)
main(args)
plot_histogram(counter, label[, plot])
plot_pie(counter, label[, plot])

scripts.evaluate_db Module

The evaluation software for UCCA layer 1.

Functions

evaluate(guessed, ref[, converter, verbose, …]) Compare two passages and return requested diagnostics and scores, possibly printing them too.
main(args)

scripts.evaluate_standard Module

The evaluation script for UCCA layer 1.

Functions

check_args(args)
main(args)
match_by_id(guessed, ref)
print_f1(result, eval_type)
summarize(args, results, eval_type)

scripts.find_constructions Module

Functions

add_argument(argparser[, default])
external_write_mode(*args, **kwargs)
extract_candidates(passage[, constructions, …]) Find candidate edges by constructions in UCCA passage.
get_passages_with_progress_bar(filename_patterns)
main(args)

scripts.fix_tokenization Module

Functions

context(i, terminals)
create_token_element(state, text, is_punctuation)
create_unit_element(state, text, tag)
decode_special_chars(tokens)
expand_to_neighboring_punct(i, is_puncts)
>>> expand_to_neighboring_punct(0, [False, True, True])
false_indices(l)
fix_tokenization(passage, words_set, lang, cw)
from_site(elem) Converts site XML structure to core.Passage object.
get_parents(paragraph, elements)
get_passages_with_progress_bar(filename_patterns)
get_tokenizer([tokenized, lang])
handle_words_set(rule, i, terminals, …) use set of words to determine the right fix needed
insert_punct(insert_index, …)
insert_retokenized(terminal, …)
insert_retokenized_currency(i, terminals, …)
insert_spaces(tokens)
is_punct(text)
main(args)
normalize(passage[, extra])
read_dict(file)
retokenize(i, start, end, terminals, …)
split_apostrophe_to_units(i, terminals, …) split token with apostrophe to Elaborator and Center.
split_apostrophe_unanalyzable(i, terminals, …) Split apostrophe as unanalyzable.
split_hyphen_to_units(i, terminals, …) split token with hyphen to two different units.
split_hyphen_unanalyzable(i, terminals, …) split token with hyphens to unanalyzable tokens.
split_possessive_s_to_units(i, terminals, …) split possessive s to two different units.
split_possessive_s_unanalyzable(i, …) split possessive s as unanalyzable.
strip_context(new_context, old_context, …)
>>> strip_context(["I", "'ve", "done"], ["I", "'ve", "done"], 1, 1)
to_site(passage) Converts a passage to the site XML format.
write_passage(passage[, output_format, …]) Write a given UCCA passage in any format.

Classes

Element
SiteCfg Contains static configuration for conversion to/from the site XML.
SiteUtil Contains utility functions for converting to/from the site XML.
State()

Class Inheritance Diagram

Inheritance diagram of scripts.fix_tokenization.State

scripts.join_passages Module

Functions

get_passages(filename_patterns, **kwargs)
main(args)
passage2file(passage, filename[, indent, binary]) Writes a UCCA passage as a standard XML file or a binary pickle :param passage: passage object to write :param filename: file name to write to :param indent: whether to indent each line :param binary: whether to write pickle format (or XML)

scripts.join_sdp Module

Functions

main(args)

scripts.load_word_vectors Module

Functions

get_word_vectors([dim, size, filename, vocab]) Get word vectors from spaCy model or from text file :param dim: dimension to trim vectors to (default: keep original) :param size: maximum number of vectors to load (default: all) :param filename: text file to load vectors from (default: from spaCy model) :param vocab: instead of strings, look up keys of returned dict in vocab (use lang str, e.g.
main(args)

scripts.normalize Module

Functions

get_passages_with_progress_bar(filename_patterns)
main(args)
normalize(passage[, extra])
write_passage(passage[, output_format, …]) Write a given UCCA passage in any format.

scripts.pickle_to_standard Module

Functions

file2passage(filename) Opens a file and returns its parsed Passage object Tries to read both as a standard XML file and as a binary pickle :param filename: file name to write to
main(args)
passage2file(passage, filename[, indent, binary]) Writes a UCCA passage as a standard XML file or a binary pickle :param passage: passage object to write :param filename: file name to write to :param indent: whether to indent each line :param binary: whether to write pickle format (or XML)

scripts.replace_tokens_by_dict Module

Functions

glob(pathname, *[, recursive]) Return a list of paths matching a pathname pattern.
main(args)
read_dictionary_from_file(filename)

scripts.site_pickle_to_standard Module

Functions

glob(pathname, *[, recursive]) Return a list of paths matching a pathname pattern.
main(args)
pickle_site2passage(filename) Opens a pickle file containing XML in UCCA site format and returns its parsed Passage object
write_passage(passage[, output_format, …]) Write a given UCCA passage in any format.

scripts.site_to_standard Module

Functions

check_illegal_combinations(args)
db2passage(handle, pid, user) Gets the annotation of user to pid from the DB handle - returns a passage
fromstring(text[, parser]) Parse XML document from string constant.
glob(pathname, *[, recursive]) Return a list of paths matching a pathname pattern.
main(args)
site2passage(filename) Opens a file and returns its parsed Passage object
write_passage(passage[, output_format, …]) Write a given UCCA passage in any format.

scripts.site_to_text Module

Functions

db2passage(handle, pid, user) Gets the annotation of user to pid from the DB handle - returns a passage
fromstring(text[, parser]) Parse XML document from string constant.
main(args)
site2passage(filename) Opens a file and returns its parsed Passage object

scripts.split_corpus Module

Functions

copy(src, dest[, link])
copyfile(src, dst, *[, follow_symlinks]) Copy data from src to dst.
main(args)
not_split_dir(filename)
numeric(s)
split_passages(directory, train, dev, link)

scripts.standard_to_pickle Module

Functions

external_write_mode(*args, **kwargs)
file2passage(filename) Opens a file and returns its parsed Passage object Tries to read both as a standard XML file and as a binary pickle :param filename: file name to write to
main(args)
passage2file(passage, filename[, indent, binary]) Writes a UCCA passage as a standard XML file or a binary pickle :param passage: passage object to write :param filename: file name to write to :param indent: whether to indent each line :param binary: whether to write pickle format (or XML)

scripts.standard_to_sentences Module

Functions

external_write_mode(*args, **kwargs)
extract_terminals(p) returns an iterator of the terminals of the passage p
get_passages_with_progress_bar(filename_patterns)
main(args)
normalize(passage[, extra])
passage2file(passage, filename[, indent, binary]) Writes a UCCA passage as a standard XML file or a binary pickle :param passage: passage object to write :param filename: file name to write to :param indent: whether to indent each line :param binary: whether to write pickle format (or XML)
split2sentences(passage[, remarks, lang, ids])
split_passage(passage, ends[, remarks, ids, …]) Split the passage on the given terminal positions :param passage: passage to split :param ends: sequence of positions at which the split passages will end :param remarks: add original node ID as remarks to the new nodes :param ids: optional iterable of ids, the same length as ends, to set passage IDs for each split :param suffix_format: in case ids is None, use this format for the running index suffix :param suffix_start: in case ids is None, use this starting index for the running index suffix :return: sequence of passages
warning(msg, *args, **kwargs) Log a message with severity ‘WARNING’ on the root logger.

Classes

Splitter(sentences[, enum, suffix_format, …])
count count(start=0, step=1) –> count object

Class Inheritance Diagram

Inheritance diagram of scripts.standard_to_sentences.Splitter

scripts.standard_to_site Module

Functions

external_write_mode(*args, **kwargs)
get_passages_with_progress_bar(filename_patterns)
main(args)
tostring(element[, encoding, method, …]) Generate string representation of XML element.

scripts.standard_to_text Module

Functions

file2passage(filename) Opens a file and returns its parsed Passage object Tries to read both as a standard XML file and as a binary pickle :param filename: file name to write to
get_passages_with_progress_bar(filename_patterns)
glob(pathname, *[, recursive]) Return a list of paths matching a pathname pattern.
main(args)
numeric(x)
to_text(passage[, sentences, lang]) Converts from a Passage object to tokenized strings.
write_text(passage, f, sentences, lang[, …])

scripts.statistics Module

Functions

get_passages_with_progress_bar(filename_patterns)
main(args)

scripts.unique_roles Module

Functions

get_passages_with_progress_bar(filename_patterns)
main(args)

scripts.validate Module

Functions

Pool Returns a process pool object
check_args(parser, args)
external_write_mode(*args, **kwargs)
get_passages_with_progress_bar(filename_patterns)
main(args)
normalize(passage[, extra])
print_errors(passage_id, errors[, id_len])
validate(passage[, linkage, multigraph])

Classes

Validator([normalization, extra, linkage, …])

Class Inheritance Diagram

Inheritance diagram of scripts.validate.Validator

scripts.visualize Module

Functions

external_write_mode(*args, **kwargs)
get_passages(filename_patterns, **kwargs)
get_passages_with_progress_bar(filename_patterns)
main(args)
print_text(args, text, suffix)
split2sentences(passage[, remarks, lang, ids])