Phon-Amnuaisuk, Somnuk, Murata, Ken T., Pavarangkoon, Praphan, Mizuhara, Takamichi and Hadi, Shiqah (2019) Children Activity Descriptions from Visual and Textual Associations In: Lecture Notes in Computer Science, Multi-disciplinary Trends in Artificial Intelligence Springer International Publishing, 121-132.
Augmented visual monitoring devices with the ability to describe children’s activities, i.e., whether they are asleep, awake, crawling or climbing, open up possibilities for various applications in promoting safety and well being amongst children. We explore children’s activity description based on an encoder-decoder framework. The correlations between semantic of the image and its textual description are captured using convolution neural network (CNN) and recurrent neural network (RNN). Encoding semantic information as activation patterns of CNN and decoding textual description using probabilistic language model based on RNN can produce relevant descriptions but often suffer from lack of precision. This is because a probabilistic model generates descriptions based on the frequency of words conditioned by contexts. In this work, we explore the effects of adding contexts such as domain specific images and adding pose information to the encoder-decoder models.
Item Type:
Book Section
Identification Number (DOI):
Deposited by:
ระบบ อัตโนมัติ
Date Deposited:
2021-09-06 03:38:22
Last Modified:
2021-09-23 20:16:19