Week 4: HandPose Spotlight

Research

Google Research: Editing material properties of objects with text-to-image models and synthetic data

Screenshot 2024-10-02 at 11.21.36 AM.png

Allows users to smoothly edit material properties, such as color, shininess, or transparency, in images using text-to-image models.

What data might have been used to train the machine learning model?

Synthetic dataset 100 3D models of household objects of varying geometric shapes
Using each base image, they adjusted a single attribute at a time with different edit strengths(scalar value that changes material attribute)

Screenshot 2024-10-02 at 11.29.08 AM.png

What type of machine learning models did the creator use?

Stable Diffusion 1.5, text to image
Taught with dataset of synthetic images + corresponding edit strength

→ Model learns how to apply desired material changes while preserving the rest of the object
Input: image, instruction(”change the transparency of the pumpkin”), and edit strength
Output: effectively changes appearance of object wile retaining object’s shape and image lighting

Why did the creator of the project choose to use this machine learning model?

Capability in generating high-quality, detailed images given any text input

I was super impressed by how well this method could take an image, locate and isolate an object based on the prompt, and make edits all while preserving the background imagery + lighting. Such as in the chair example, adjusting the transparency shows the invisible geometry inside the chair + creates new shadows and highlights to make it look realistic. I’d be interested to see how technology like this may change how we use apps like Photoshop.

Making

For this week’s assignment, I was inspired by the HandPose - Quadrilateral by Jack Du.

I thought it’d be interesting to create a spotlight/magnifying glass effect using the copy() function I experimented with last week + graphics buffers.

I started out with modifying the quadrilateral since copy() only takes 90 degree rectangles and approximated height using a thumb + index finger and width using both index fingers.

I drew the copy on a buffer to create a spotlight effect, blacking out the space the wasn’t defined by my fingers.

Screenshot 2024-10-02 at 1.48.28 PM.png

if (hands.length >= 2) {
    
    const pointA = hands[1].index_finger_tip;
    const pointB = hands[0].index_finger_tip;
    const pointC = hands[0].thumb_tip;
    const pointD = hands[1].thumb_tip;
  
    let copyWidth = pointB.x - pointA.x // 2 index fingers
    let copyHeight = abs(pointD.y - pointA.y) // thumb and index finger
    
    copy(video, pointA.x, pointA.y, copyWidth, copyHeight, pointA.x, pointA.y, copyWidth, copyHeight);
    
  }

I then played around with toggling different filter states when only one hand is detected. When pointing in the upper right corner, the filter switches from a blur effect to a black & white effect.