title: ai4lam Metadata/Discovery WG Monthly Meeting
November 9, 2021#
9 AM California | 12 PM Washington DC | 5 PM UK | 6 PM Oslo & Paris
Connection Information
Topic: AI-LAM Metadata Working Group
Time: This is a recurring meeting. Meet anytime
Join from PC, Mac, Linux, iOS or Android: https://stanford.zoom.us/j/91421044393?pwd=L0VLbnQ0WlE4SDV0MDY5SUhTQnVydz09
Password: 306295
Or iPhone one-tap (US Toll): +18333021536,,91421044393# or +16507249799,,91421044393#
Or Telephone:
Dial: +1 650 724 9799 (US, Canada, Caribbean Toll) or +1 833 302 1536 (US, Canada, Caribbean Toll Free)
Meeting ID: 914 2104 4393
Password: 306295
International numbers available: https://stanford.zoom.us/u/aeoeCDrpd
Meeting ID: 914 2104 4393
Password: 306295
SIP: 91421044393@zoomcrc.com
Password: 306295
Attending
Jeremy Nelson (Stanford)
Regrets
Name
**Notetaker (alpha by first name): **
[]{#anchor}Helpful Links
[]{#anchor-1}Project Documents and Data
[]{#anchor-2}Agenda Topics
Updates, announcements, intros
David Lowe presentation
a. Presentation > https://docs.google.com/presentation/d/1A6uKsvv7KPG1OzvoRBwv14M9d9y6rBwiWP7ohs3iBtI/edit > . DOI for the article in a special issue of Cataloging and > Classification Quarterly > https://doi.org/10.1080/01639374.2021.1998281
b. Thought piece disguised As AI/ML/DS project. Context student DS > project ono campus without much text component.
c. What is ScholComm in Libraries, focus on cycles of scholarly > activity an
d. OAKTrust Repository
i. Undergraduate (Honors) Theses ii. Graduate These and Dissertations iii. Faculty Articles, Open Access (QA) version iv. Conference papers, posters v. Departmental report vi. Local digital collections
e. Local Data
i. Faculty Open Access article n-\~8k of \~1 have abstracts
f. Pateur’s Quadrant 1997, Applied and Basic Research. Need to be > really applications for this research, as opposed to just > basic research. Stokes,
i. Quest for fundamental understanding? vs. Consideration of use?
g. Focused on Global Standards 1297.0 Australian and New Zealand > Standard
i. Pure basic research - Understanding ii. Strategic basic - Understanding iii. Applied research - Use iv. Experimental development - Use
h. Text Mining Method
i. Basic Research - Applied Research - ii. Sort into Basic vs. Applied iii. Use preferred sentiment analysis-type tools iv. Metadata tags can be assigned
i. Overview
i. Librarians labed (n=200 of n=1000) 1. Ether "Basic' or "Applied" + verbs ii. Student grappled with tools iii. BERT yield good results iv. QA/QC UX
j. Research Questions
i. Mine abstracts > denote type of research? ii. Establish accuracy of that mining? Measure of accuracy iii. Include as metadata in records? iv. Flag elements as AI-generated?
k. Types per Frascati
i. Basic Research ii. Applied Research iii. Experimental Development iv. Shared - Acquiring new knowledge
l. Labeling
i. 4 subject librarians labeled 50 abstracts as basic or applied research ii. Pulled out indicator verbs toward type iii. Ranked verbs 1-10 as how indicative iv. Verb set not used to date
m. Student Team
i. Python Word Cloud: Basic vs. Applied ii. Gensim Word2Vec 1. Labeled data a. 80% used to train b. 20% used to validate iii. Feed Forward NN: dictionary matching iv. BERT
n. Student team results
i. BERT 90% accurate per self-report ii. Training data pass, then validation data iii. Of 1000 abstracts 1. 70% determined Basic 2. 30% Applied
o. QA/QC: Dept as Proxy for Applied Research
i. University Libraries ii. Social Sciences/Humanities iii. Engineering iv. Atmospheric Sciences v. Educational Psychology vi. Construction Science vii. Other
p. QA/QC: Dept as Proxy for Basic Research
i. Physics and Astronomy ii. Biological and Agricultural Engineering iii. Atmospheric Sciences iv. Psychology v. Some Engineering
q. UX side of AI Metadata
i. From Estonian National Archives, labeled "Computer-detected objects" ii. Out there; Where humans disagree and risk is low, let the algorithm decide?
Survey question review (if time)
a. https://docs.google.com/document/d/1aMGRqeF-6BrGW7qvRat8V86nCVO-a4SxLbLexJMMx08/edit
b. Next step, review by external survey expert