November 9, 2021

title: ai4lam Metadata/Discovery WG Monthly Meeting

November 9, 2021#

9 AM California | 12 PM Washington DC | 5 PM UK | 6 PM Oslo & Paris

Connection Information

Topic: AI-LAM Metadata Working Group

Time: This is a recurring meeting. Meet anytime

Join from PC, Mac, Linux, iOS or Android: https://stanford.zoom.us/j/91421044393?pwd=L0VLbnQ0WlE4SDV0MDY5SUhTQnVydz09

Password: 306295

Or iPhone one-tap (US Toll): +18333021536,,91421044393# or +16507249799,,91421044393#

Or Telephone:

Dial: +1 650 724 9799 (US, Canada, Caribbean Toll) or +1 833 302 1536 (US, Canada, Caribbean Toll Free)

Meeting ID: 914 2104 4393

Password: 306295

International numbers available: https://stanford.zoom.us/u/aeoeCDrpd

Meeting ID: 914 2104 4393

Password: 306295

SIP: 91421044393@zoomcrc.com

Password: 306295

Attending

Jeremy Nelson (Stanford)

Regrets

Name

**Notetaker (alpha by first name): **

[]{#anchor}Helpful Links

Metadata WG Zotero Group Library

[]{#anchor-1}Project Documents and Data

WG charter

WG Google Drive folder

[]{#anchor-2}Agenda Topics

Updates, announcements, intros

David Lowe presentation

a. Presentation > https://docs.google.com/presentation/d/1A6uKsvv7KPG1OzvoRBwv14M9d9y6rBwiWP7ohs3iBtI/edit > . DOI for the article in a special issue of Cataloging and > Classification Quarterly > https://doi.org/10.1080/01639374.2021.1998281

b. Thought piece disguised As AI/ML/DS project. Context student DS > project ono campus without much text component.

c. What is ScholComm in Libraries, focus on cycles of scholarly > activity an

d. OAKTrust Repository

i.  Undergraduate (Honors) Theses
ii. Graduate These and Dissertations
iii. Faculty Articles, Open Access (QA) version
iv. Conference papers, posters
v.  Departmental report
vi. Local digital collections

e. Local Data

i.  Faculty Open Access article n-\~8k of \~1 have abstracts

f. Pateur’s Quadrant 1997, Applied and Basic Research. Need to be > really applications for this research, as opposed to just > basic research. Stokes,

i.  Quest for fundamental understanding? vs. Consideration of
    use?

g. Focused on Global Standards 1297.0 Australian and New Zealand > Standard

i.  Pure basic research - Understanding
ii. Strategic basic - Understanding
iii. Applied research - Use
iv. Experimental development - Use

h. Text Mining Method

i.  Basic Research - Applied Research -
ii. Sort into Basic vs. Applied
iii. Use preferred sentiment analysis-type tools
iv. Metadata tags can be assigned

i. Overview

i.  Librarians labed (n=200 of n=1000)

    1.  Ether "Basic' or "Applied" + verbs

ii. Student grappled with tools

iii. BERT yield good results

iv. QA/QC UX

j. Research Questions

i.  Mine abstracts > denote type of research?
ii. Establish accuracy of that mining? Measure of accuracy
iii. Include as metadata in records?
iv. Flag elements as AI-generated?

k. Types per Frascati

i.  Basic Research
ii. Applied Research
iii. Experimental Development
iv. Shared - Acquiring new knowledge

l. Labeling

i.  4 subject librarians labeled 50 abstracts as basic or
    applied research
ii. Pulled out indicator verbs toward type
iii. Ranked verbs 1-10 as how indicative
iv. Verb set not used to date

m. Student Team

i.  Python Word Cloud: Basic vs. Applied

ii. Gensim Word2Vec

    1.  Labeled data

        a.  80% used to train
        b.  20% used to validate

iii. Feed Forward NN: dictionary matching

iv. BERT

n. Student team results

i.  BERT 90% accurate per self-report

ii. Training data pass, then validation data

iii. Of 1000 abstracts

     1.  70% determined Basic

     2.  30% Applied

o. QA/QC: Dept as Proxy for Applied Research

i.  University Libraries
ii. Social Sciences/Humanities
iii. Engineering
iv. Atmospheric Sciences
v.  Educational Psychology
vi. Construction Science
vii. Other

p. QA/QC: Dept as Proxy for Basic Research

i.  Physics and Astronomy
ii. Biological and Agricultural Engineering
iii. Atmospheric Sciences
iv. Psychology
v.  Some Engineering

q. UX side of AI Metadata

i.  From Estonian National Archives, labeled "Computer-detected
    objects"
ii. Out there; Where humans disagree and risk is low, let the
    algorithm decide?

Survey question review (if time)

a. https://docs.google.com/document/d/1aMGRqeF-6BrGW7qvRat8V86nCVO-a4SxLbLexJMMx08/edit

b. Next step, review by external survey expert