4th, Dynamic Gestures

Now things start getting interesting, it’s time put all together and start recognizing meaningful gestures

After some study on usage scenarios, and trying lots of air gestures in the void like a crazy person o_O , I came to the conclusion that gestures should be divided in 4 main categories:

 

 Fixed Gestures

These are the simplest of them all, just a static gesture to the camera

Examples:

* Mute/Unmute: just show your palm closed, like a “stop” gesture

* Volume Up/Down: doing an “L” sign pointing Up/Down 

Up

 

Down

* Channel Up/Down: same “L sing but pointing left or right

Channel Up

Channel Down

 – Moving Gestures

These are based on a single gesture, but with movement, like linear motion or rotation

Examples:

* Channel Up/Down: Showing the index and anular fingers separated, make a Left/Right linear displacement, AKA“The Jedi gesture”

Up

Down

* Volume: this is a rotation based gesture, showing thumb, index and anular, rotate your hand like if you are holding an imaginary volume know

Up

Down

 

– Transition Gestures

In this case the action begins with one gesture and end with another

Example:

* Power On/Off, show your open palm and then make a fist to turn the TV Off, same backwards to turn it On, AKA “the IronMan gesture”

 

– Writing Gestures

This is a very complex and challenging action, still under development, but basically you hold index and annular close together and write digits in the air, those will be sent to the TV same as the button numbers in your remote control.

To make the previous examples possible, a new MediaPipe graph was created, with many new calculators:

 First, a little modification to the static gestures graph:

As you can see, it now includes the “dynamicGestures” subgraph, which will be in charge of the interpretation of the gestures. This new subgraph receives Landmarks, Angles (see part 2 for an in detail description) and Detections (used for gesture classification)

 This is the “dynamicGestures” graph:

 

Let me explain the data flow of this graph:

 

  1. The FlowLimiter (bottom left corner) will only allow passage of new packets once the current packet is processed.

  2. gestureClassifierCalculator (top center) categorizes the incoming gestures and then will send a latch packet to one of the four latches below, those latches then allow passage of successive incoming gesture packets to the appropriate calculator in charge of each gesture category. The gestures categories are parameterized in a file that the calculator takes as a mandatory parameter in the graph’s configuration:

gestures_types_file_name: “myMediapipe/projects/dynamicGestures/dynamic_gestures_map.txt”

 

  1. Each gesture calculator has a specific set of parameters:

A- “fixedDynamicGesturesCalculator“, allows the same gesture to be used for different actions, i.e. the “L” gesture will trigger Volume Up/Down and Channel Up/Down based on the rotation of the hand, index pointing up in Vol Up, pointing left is Channel Up, etc. You can also specify a gesture with only one behaviour, in this case used for Mute. 

Common to all calculators is the mqtt_message option, subdivided in topic and payload 

Other relevant options are:

–  fixed_time_out_s: the max time to wait for gesture completion and release the control of the incoming packets. 

– Time_between_actions: if set, indicates how often the action should auto repeat.

– Landmark_id & Angle_number: the angle from which to differentiate which action within a same gesture, in this case 0, the horizontal rotation of the hand.

 

B- “movingDynamicGesturesCalculator, also allows the same gesture to be used on different actions but based on displacement or rotation

– Action_type: whether is a ROTATION or TRASLATION movement. 

– landmark_id & angle_number: the angle in case of a rotation based gesture

– time_between_actions & auto_repeat: If and how frequent the action should be repeated

– max_repeat: Maximum number of action repetition, used to prevent an excessive number of actions, ie turning the volume too loud

– action_threshold: specific to traslation movements, a threshold to trigger the action based on how much distance the hand has traveled

– In this case we also have the mqtt topic, but the payload is divided in positive_payload and negative_payload, ie in the case of the volume action, it will send the positive or neg whether the hand is rotating clockwise or counter clockwise, channels Up/Down depending on left to right movement or backwards.

C- “transitionDynamicGesturesCalculator” , these begin with one gesture and end with another

– Start_action & end_action: the initial and final gesture (should be renamed in the future)

– time_out_s, topic & payload: same as in the “fixedDynamicGesturesCalculator

D- As mentioned, “writingDynamicGesturesCalculator” is still under development and will be discussed when implemented

 

  1.  After the corresponding calculator has “transformed” a gesture or series of gestures into an executable action, they will generate two packets:

    1. A Mqtt packet with required topic and payload

    2. A ……_gesture_clear packet that will release the flow control of gesture packets from the current calculator back to the “gestureClassifierCalculator

  2.  Lastly, the “MqttPublisherCalculator” will publish the received action to the specified broker:

This calculator is an implementation on the super simple and easy to understand simple-mqtt-client by mec-kon.

 

Note: I’m not 100% convinced with this flow control mechanism, it seems unreliable and susceptible to freezes if a particular gesture calculator goes haywire. Initially I evaluated other models but ran into some issues that would have taken me too much time to solve, but now this is released as open source and eager to get some help on the matter. 

The issue with other approaches is that sending packets “upstream” in the flow generates sync errors.

I remember having read some suggestion to a similar problem in the issues section on the mediapipe’s github, where someone suggested using callbacks, but this seems a bit nasty to me since the whole idea behind calculators is that they are supposed to be indepent execution units.

A quick, temporary fix could be a “WatchDog” calculator that generates a latch release packet after a certain time, with an input monitoring the output of the gesture calculators.

If you got this far you definitely deserve a treat 😀 , this is the Github repo for myMediapipe 

…still lots to do