Scripts Documentation¶
scripts.annotate Module¶
Functions¶
annotate_all (passages[, replace, as_array, …]) |
Run spaCy pipeline on the given passages, unless already annotated :param passages: iterable of Passage objects, whose layer 0 nodes will be added entries in the `extra’ dict :param replace: even if a given passage is already annotated, replace with new annotation :param as_array: instead of adding `extra’ entries to each terminal, set layer 0 extra[“doc”] to array of ids :param as_extra: set `extra’ entries to each terminal :param as_tuples: treat input as tuples of (passage text, context), and return context for each passage as-is :param lang: optional two-letter language code, will be overridden if passage has “lang” attrib :param vocab: optional dictionary of vocabulary IDs to string values, to avoid loading spaCy model :param verbose: whether to print annotated text :return: generator of annotated passages, which are actually modified in-place (same objects as input) |
get_passages_with_progress_bar (filename_patterns) |
|
is_annotated (passage[, as_array, as_extra]) |
Whether the passage is already annotated or only partially annotated |
main (args) |
|
write_passage (passage[, output_format, …]) |
Write a given UCCA passage in any format. |
scripts.convert_1_0_to_1_2 Module¶
Functions¶
annotate_all (passages[, replace, as_array, …]) |
Run spaCy pipeline on the given passages, unless already annotated :param passages: iterable of Passage objects, whose layer 0 nodes will be added entries in the `extra’ dict :param replace: even if a given passage is already annotated, replace with new annotation :param as_array: instead of adding `extra’ entries to each terminal, set layer 0 extra[“doc”] to array of ids :param as_extra: set `extra’ entries to each terminal :param as_tuples: treat input as tuples of (passage text, context), and return context for each passage as-is :param lang: optional two-letter language code, will be overridden if passage has “lang” attrib :param vocab: optional dictionary of vocabulary IDs to string values, to avoid loading spaCy model :param verbose: whether to print annotated text :return: generator of annotated passages, which are actually modified in-place (same objects as input) |
convert_passage (passage, report_writer) |
|
copy_edge (edge[, parent, child, tag, attrib]) |
|
destroy (node_or_edge) |
|
extract_aux (terminal, parent, grandparent) |
|
extract_ground (terminal, parent, grandparent) |
|
extract_modal (terminal, parent, grandparent) |
|
extract_relator (terminal, parent, grandparent) |
|
extract_that (terminal, parent, grandparent) |
|
fix_punct (terminal, parent, grandparent) |
|
fix_root_terminal_child (terminal, parent, …) |
|
fix_unary_participant (terminal, parent, …) |
|
flag_relator_starts_main_relation (terminal, …) |
|
flag_suspected_secondary (terminal, parent, …) |
|
fparent (node_or_edge) |
|
get_annotation (terminal, attr) |
|
get_passages_with_progress_bar (filename_patterns) |
|
is_main_relation (node) |
|
main (args) |
|
move_node (node, new_parent[, tag]) |
|
remove (parent, child) |
|
set_light_verb_function (terminal, parent, …) |
|
write_passage (passage[, output_format, …]) |
Write a given UCCA passage in any format. |
scripts.convert_2_0_to_1_2 Module¶
Functions¶
convert_passage (passage, report_writer) |
|
copy_edge (edge[, parent, child, tag, attrib]) |
|
destroy (node_or_edge) |
|
get_passages_with_progress_bar (filename_patterns) |
|
main (args) |
|
replace_time_and_quantifier (edge) |
|
write_passage (passage[, output_format, …]) |
Write a given UCCA passage in any format. |
scripts.count_parents_children Module¶
Functions¶
clip (l, m) |
|
get_passages_with_progress_bar (filename_patterns) |
|
main (args) |
|
plot_histogram (counter, label[, plot]) |
|
plot_pie (counter, label[, plot]) |
scripts.evaluate_db Module¶
The evaluation software for UCCA layer 1.
scripts.evaluate_standard Module¶
The evaluation script for UCCA layer 1.
Functions¶
check_args (args) |
|
main (args) |
|
match_by_id (guessed, ref) |
|
print_f1 (result, eval_type) |
|
summarize (args, results, eval_type) |
scripts.find_constructions Module¶
scripts.fix_tokenization Module¶
Functions¶
context (i, terminals) |
|
create_token_element (state, text, is_punctuation) |
|
create_unit_element (state, text, tag) |
|
decode_special_chars (tokens) |
|
expand_to_neighboring_punct (i, is_puncts) |
>>> expand_to_neighboring_punct(0, [False, True, True])
|
false_indices (l) |
|
fix_tokenization (passage, words_set, lang, cw) |
|
from_site (elem) |
Converts site XML structure to core .Passage object. |
get_parents (paragraph, elements) |
|
get_passages_with_progress_bar (filename_patterns) |
|
get_tokenizer ([tokenized, lang]) |
|
handle_words_set (rule, i, terminals, …) |
use set of words to determine the right fix needed |
insert_punct (insert_index, …) |
|
insert_retokenized (terminal, …) |
|
insert_retokenized_currency (i, terminals, …) |
|
insert_spaces (tokens) |
|
is_punct (text) |
|
main (args) |
|
normalize (passage[, extra]) |
|
read_dict (file) |
|
retokenize (i, start, end, terminals, …) |
|
split_apostrophe_to_units (i, terminals, …) |
split token with apostrophe to Elaborator and Center. |
split_apostrophe_unanalyzable (i, terminals, …) |
Split apostrophe as unanalyzable. |
split_hyphen_to_units (i, terminals, …) |
split token with hyphen to two different units. |
split_hyphen_unanalyzable (i, terminals, …) |
split token with hyphens to unanalyzable tokens. |
split_possessive_s_to_units (i, terminals, …) |
split possessive s to two different units. |
split_possessive_s_unanalyzable (i, …) |
split possessive s as unanalyzable. |
strip_context (new_context, old_context, …) |
>>> strip_context(["I", "'ve", "done"], ["I", "'ve", "done"], 1, 1)
|
to_site (passage) |
Converts a passage to the site XML format. |
write_passage (passage[, output_format, …]) |
Write a given UCCA passage in any format. |
Classes¶
Element |
|
SiteCfg |
Contains static configuration for conversion to/from the site XML. |
SiteUtil |
Contains utility functions for converting to/from the site XML. |
State () |
Class Inheritance Diagram¶
scripts.join_passages Module¶
Functions¶
get_passages (filename_patterns, **kwargs) |
|
main (args) |
|
passage2file (passage, filename[, indent, binary]) |
Writes a UCCA passage as a standard XML file or a binary pickle :param passage: passage object to write :param filename: file name to write to :param indent: whether to indent each line :param binary: whether to write pickle format (or XML) |
scripts.load_word_vectors Module¶
Functions¶
get_word_vectors ([dim, size, filename, vocab]) |
Get word vectors from spaCy model or from text file :param dim: dimension to trim vectors to (default: keep original) :param size: maximum number of vectors to load (default: all) :param filename: text file to load vectors from (default: from spaCy model) :param vocab: instead of strings, look up keys of returned dict in vocab (use lang str, e.g. |
main (args) |
scripts.normalize Module¶
scripts.pickle_to_standard Module¶
Functions¶
file2passage (filename) |
Opens a file and returns its parsed Passage object Tries to read both as a standard XML file and as a binary pickle :param filename: file name to write to |
main (args) |
|
passage2file (passage, filename[, indent, binary]) |
Writes a UCCA passage as a standard XML file or a binary pickle :param passage: passage object to write :param filename: file name to write to :param indent: whether to indent each line :param binary: whether to write pickle format (or XML) |
scripts.replace_tokens_by_dict Module¶
Functions¶
glob (pathname, *[, recursive]) |
Return a list of paths matching a pathname pattern. |
main (args) |
|
read_dictionary_from_file (filename) |
scripts.site_pickle_to_standard Module¶
Functions¶
glob (pathname, *[, recursive]) |
Return a list of paths matching a pathname pattern. |
main (args) |
|
pickle_site2passage (filename) |
Opens a pickle file containing XML in UCCA site format and returns its parsed Passage object |
write_passage (passage[, output_format, …]) |
Write a given UCCA passage in any format. |
scripts.site_to_standard Module¶
Functions¶
check_illegal_combinations (args) |
|
db2passage (handle, pid, user) |
Gets the annotation of user to pid from the DB handle - returns a passage |
fromstring (text[, parser]) |
Parse XML document from string constant. |
glob (pathname, *[, recursive]) |
Return a list of paths matching a pathname pattern. |
main (args) |
|
site2passage (filename) |
Opens a file and returns its parsed Passage object |
write_passage (passage[, output_format, …]) |
Write a given UCCA passage in any format. |
scripts.site_to_text Module¶
Functions¶
db2passage (handle, pid, user) |
Gets the annotation of user to pid from the DB handle - returns a passage |
fromstring (text[, parser]) |
Parse XML document from string constant. |
main (args) |
|
site2passage (filename) |
Opens a file and returns its parsed Passage object |
scripts.split_corpus Module¶
Functions¶
copy (src, dest[, link]) |
|
copyfile (src, dst, *[, follow_symlinks]) |
Copy data from src to dst. |
main (args) |
|
not_split_dir (filename) |
|
numeric (s) |
|
split_passages (directory, train, dev, link) |
scripts.standard_to_pickle Module¶
Functions¶
external_write_mode (*args, **kwargs) |
|
file2passage (filename) |
Opens a file and returns its parsed Passage object Tries to read both as a standard XML file and as a binary pickle :param filename: file name to write to |
main (args) |
|
passage2file (passage, filename[, indent, binary]) |
Writes a UCCA passage as a standard XML file or a binary pickle :param passage: passage object to write :param filename: file name to write to :param indent: whether to indent each line :param binary: whether to write pickle format (or XML) |
scripts.standard_to_sentences Module¶
Functions¶
external_write_mode (*args, **kwargs) |
|
extract_terminals (p) |
returns an iterator of the terminals of the passage p |
get_passages_with_progress_bar (filename_patterns) |
|
main (args) |
|
normalize (passage[, extra]) |
|
passage2file (passage, filename[, indent, binary]) |
Writes a UCCA passage as a standard XML file or a binary pickle :param passage: passage object to write :param filename: file name to write to :param indent: whether to indent each line :param binary: whether to write pickle format (or XML) |
split2sentences (passage[, remarks, lang, ids]) |
|
split_passage (passage, ends[, remarks, ids, …]) |
Split the passage on the given terminal positions :param passage: passage to split :param ends: sequence of positions at which the split passages will end :param remarks: add original node ID as remarks to the new nodes :param ids: optional iterable of ids, the same length as ends, to set passage IDs for each split :param suffix_format: in case ids is None, use this format for the running index suffix :param suffix_start: in case ids is None, use this starting index for the running index suffix :return: sequence of passages |
warning (msg, *args, **kwargs) |
Log a message with severity ‘WARNING’ on the root logger. |
Class Inheritance Diagram¶
scripts.standard_to_site Module¶
scripts.standard_to_text Module¶
Functions¶
file2passage (filename) |
Opens a file and returns its parsed Passage object Tries to read both as a standard XML file and as a binary pickle :param filename: file name to write to |
get_passages_with_progress_bar (filename_patterns) |
|
glob (pathname, *[, recursive]) |
Return a list of paths matching a pathname pattern. |
main (args) |
|
numeric (x) |
|
to_text (passage[, sentences, lang]) |
Converts from a Passage object to tokenized strings. |
write_text (passage, f, sentences, lang[, …]) |
scripts.unique_roles Module¶
scripts.validate Module¶
Functions¶
Pool |
Returns a process pool object |
check_args (parser, args) |
|
external_write_mode (*args, **kwargs) |
|
get_passages_with_progress_bar (filename_patterns) |
|
main (args) |
|
normalize (passage[, extra]) |
|
print_errors (passage_id, errors[, id_len]) |
|
validate (passage[, linkage, multigraph]) |
Class Inheritance Diagram¶
scripts.visualize Module¶
Functions¶
external_write_mode (*args, **kwargs) |
|
get_passages (filename_patterns, **kwargs) |
|
get_passages_with_progress_bar (filename_patterns) |
|
main (args) |
|
print_text (args, text, suffix) |
|
split2sentences (passage[, remarks, lang, ids]) |