Reading
**Excavating AI: The Politics of Images in Machine Learning Training Sets by Kate Crawford and Trevor Paglen.**
- “Automated interpretation of images is an inherently social and political project, rather than a purely technical one”
- relationship between image + meaning is very nuanced and complex
- 3 layers: (The Japanese Female Facial Expression(JAFFE) Database)
- overall taxonomy(ex. facial expressions depicting the emotions of Japanese women)
- individual classes(ex. happiness, sadness, surprise, disgust, fear, anger, neutral etc)
- individually labeled image(content ex. a woman looking surprised)
- ImageNet- “map out the entire world of objects”
- synsets, representing a distinct concept, organized into nested hierarchy
- “Chair”→ artifact > furnishing > furniture > seat > chair
- Restricted to nouns
- Assumptions under visual AI systems
- concepts are fixed, universal, and consistent
- fixed and universal correspondences between images and concepts
- Uncomplicated and measurable ties between images, referents, labels
- All concrete nouns are created equally, abstract nouns also express themselves concretely/visually(ex. anti-Semitism)
Reflect on the relationship between labels and images in a machine learning image classification dataset. Who has the power to label images and how do those labels and machine learning models trained on them impact society?
- “Ceci n’est pas une pipe” emphasizes that labels do not always reflect truths and meanings of images
- Some images are mimicking/performed(image of woman with angry expression is not the same as woman mimicking angry expression)
- Researchers, developers, corporations hold the power to shape the meaning of visual data
- These labels hold biases which are applied to ML models → impact how AI classifies/perceives the world, which are integrated into the world- hiring, education, healthcare, etc
Making
This week, I was inspired by the “this or that” filters on Instagram, Snapchat, TikTok, etc. These filters present users with 2 options, and users can choose their preference by performing specific gestures or actions such as tilting their head or pointing.
I decided on a dog themed game where users narrow down their favorite dog breed from 4 options using hand gestures, detected and classified by Teachable Machines.
How it works:
- Users use an open hand to select the right option and a closed fist to select the left option.
- Users hold their hand gesture and click the “next round” button to make their choice and advance to the next round.