title: ai4lam Metadata/Discovery WG Monthly Meeting

November 9, 2021#

9 AM California | 12 PM Washington DC | 5 PM UK | 6 PM Oslo & Paris

Connection Information

Topic: AI-LAM Metadata Working Group

Time: This is a recurring meeting. Meet anytime

Join from PC, Mac, Linux, iOS or Android: https://stanford.zoom.us/j/91421044393?pwd=L0VLbnQ0WlE4SDV0MDY5SUhTQnVydz09

Password: 306295

Or iPhone one-tap (US Toll): +18333021536,,91421044393# or +16507249799,,91421044393#

Or Telephone:

Dial: +1 650 724 9799 (US, Canada, Caribbean Toll) or +1 833 302 1536 (US, Canada, Caribbean Toll Free)

Meeting ID: 914 2104 4393

Password: 306295

International numbers available: https://stanford.zoom.us/u/aeoeCDrpd

Meeting ID: 914 2104 4393

Password: 306295

SIP: 91421044393@zoomcrc.com

Password: 306295


  • Jeremy Nelson (Stanford)


  • Name

**Notetaker (alpha by first name): **

[]{#anchor}Helpful Links

[]{#anchor-1}Project Documents and Data

[]{#anchor-2}Agenda Topics

  1. Updates, announcements, intros

  2. David Lowe presentation

    a. Presentation > https://docs.google.com/presentation/d/1A6uKsvv7KPG1OzvoRBwv14M9d9y6rBwiWP7ohs3iBtI/edit > . DOI for the article in a special issue of Cataloging and > Classification Quarterly > https://doi.org/10.1080/01639374.2021.1998281

    b. Thought piece disguised As AI/ML/DS project. Context student DS > project ono campus without much text component.

    c. What is ScholComm in Libraries, focus on cycles of scholarly > activity an

    d. OAKTrust Repository

    i.  Undergraduate (Honors) Theses
    ii. Graduate These and Dissertations
    iii. Faculty Articles, Open Access (QA) version
    iv. Conference papers, posters
    v.  Departmental report
    vi. Local digital collections

    e. Local Data

    i.  Faculty Open Access article n-\~8k of \~1 have abstracts

    f. Pateur’s Quadrant 1997, Applied and Basic Research. Need to be > really applications for this research, as opposed to just > basic research. Stokes,

    i.  Quest for fundamental understanding? vs. Consideration of

    g. Focused on Global Standards 1297.0 Australian and New Zealand > Standard

    i.  Pure basic research - Understanding
    ii. Strategic basic - Understanding
    iii. Applied research - Use
    iv. Experimental development - Use

    h. Text Mining Method

    i.  Basic Research - Applied Research -
    ii. Sort into Basic vs. Applied
    iii. Use preferred sentiment analysis-type tools
    iv. Metadata tags can be assigned

    i. Overview

    i.  Librarians labed (n=200 of n=1000)
        1.  Ether "Basic' or "Applied" + verbs
    ii. Student grappled with tools
    iii. BERT yield good results
    iv. QA/QC UX

    j. Research Questions

    i.  Mine abstracts > denote type of research?
    ii. Establish accuracy of that mining? Measure of accuracy
    iii. Include as metadata in records?
    iv. Flag elements as AI-generated?

    k. Types per Frascati

    i.  Basic Research
    ii. Applied Research
    iii. Experimental Development
    iv. Shared - Acquiring new knowledge

    l. Labeling

    i.  4 subject librarians labeled 50 abstracts as basic or
        applied research
    ii. Pulled out indicator verbs toward type
    iii. Ranked verbs 1-10 as how indicative
    iv. Verb set not used to date

    m. Student Team

    i.  Python Word Cloud: Basic vs. Applied
    ii. Gensim Word2Vec
        1.  Labeled data
            a.  80% used to train
            b.  20% used to validate
    iii. Feed Forward NN: dictionary matching
    iv. BERT

    n. Student team results

    i.  BERT 90% accurate per self-report
    ii. Training data pass, then validation data
    iii. Of 1000 abstracts
         1.  70% determined Basic
         2.  30% Applied

    o. QA/QC: Dept as Proxy for Applied Research

    i.  University Libraries
    ii. Social Sciences/Humanities
    iii. Engineering
    iv. Atmospheric Sciences
    v.  Educational Psychology
    vi. Construction Science
    vii. Other

    p. QA/QC: Dept as Proxy for Basic Research

    i.  Physics and Astronomy
    ii. Biological and Agricultural Engineering
    iii. Atmospheric Sciences
    iv. Psychology
    v.  Some Engineering

    q. UX side of AI Metadata

    i.  From Estonian National Archives, labeled "Computer-detected
    ii. Out there; Where humans disagree and risk is low, let the
        algorithm decide?
  3. Survey question review (if time)

    a. https://docs.google.com/document/d/1aMGRqeF-6BrGW7qvRat8V86nCVO-a4SxLbLexJMMx08/edit

    b. Next step, review by external survey expert