title: ai4lam Metadata/Discovery WG Monthly Meeting
May 11, 2021#
9 AM California | 12 PM Washington DC | 5 PM UK | 6 PM Oslo & Paris
Attending
Name (institution)
Tim Thompson (Yale)
Jeremy Nelson (Stanford)
David Lowe (Texas A&M)
Corey Harper (Elsevier Labs / Univ. of Amsterdam)
Regrets
Name
**Notetaker (alpha by first name): **
[]{#anchor}Helpful Links
[]{#anchor-1}Project Documents and Data
<<<<<<< HEAD#
9c0c03e (Adds remaining 2021 meetings)
[]{#anchor-2}Agenda Topics
Updates, announcements, intros
a. Knowledge Graph Conference: > https://www.knowledgegraph.tech/
b. Interesting convergence of knowledge graphs & AI. Example: kglab > Python library: > https://github.com/DerwenAI/kglab
c.
Interactive session using MARC21 with spaCy NER, https://mybinder.org/v2/gh/AI4LAM/metadata-wg-notebooks/HEAD
a. Binder works off of GitHub repo
b. pymarcspec for extracting XPath-like paths from MARC
c. Working with small dataset of 749 MARC records (about DVDs)
d. Looking at 508 (creation/production + roles), 511 (cast), 7XX > (controlled access points)
e. Question about spaCy handling of names: non-Western names (when > first name, last name is not clear to determine).
i. spaCy is not rule-based\--should be based on statistical model (e.g., conditional random fields, loosely connected to Stanford NER) ii. Could switch spaCy model based on language detection iii. Can also train your own spaCy model based on labeled data (e.g., a few thousand entities, hand-annotated) iv. marcspec can extract fixed-field data as well v. spaCy provides nice rendering of named entities in context vi. Dependency parse graph: spaCy struggling somewhat with role parsing in 508. Getting a little confused with syntax. Doesn\'t work well for lists with no verbs. vii. Dependency vs. constituency parse viii. Would need to train own model for custom entity types, e.g., titles (incorrectly recognized as person, etc.) ix. spaCy entity linking functionality\--includes an entity linker you can train using a custom knowledge base x. Corey & colleagues working on entity linking to extend existing taxonomies. 1. Challenge of finding good vocabs to link to. (OBO > Foundry, Bioportal, but vocabs/ontologies stop getting > maintained). xi. What about matching entities in inverted order vs. direct order? 1. fuzzywuzzy, token-ratio matching, bag of words > approaches 2. Token reordering 3. Building out cross-references xii. Auto-regressive entity retrieval: [*https://arxiv.org/pdf/2010.00904v3.pdf*](https://arxiv.org/pdf/2010.00904v3.pdf) xiii. Entity linking on Papers with Code: [*https://paperswithcode.com/task/entity-linking*](https://paperswithcode.com/task/entity-linking) xiv. NLP Progress: [*http://nlpprogress.com/*](http://nlpprogress.com/)
f. Corey would like to see more librarians involved in tracking, > e.g., taxonomies for dictionary lookups, other metadata for AI > resources
Recording of today’s call available at https://stanford.zoom.us/rec/share/rGFe2syZ7n0Y_XHrH6fAlxC0nkZ5Wa_eEjAiKjyqpVqTFXXqrtrEu3Xf25gCdFQP.Tn7G4MPWQrJjoSP_