Multimodal dual attention networks for 2019 DramaQA challenge

This file contains code to conduct Drama QA with Multimodal dual attention networks. data/ (preprocessing) data loader, supports image loading, feature extraction, feature caching model/ (attention module, multimodal fusion) -'attention_fusion.py' : code for Multimodal dual attention networks model -'temporal_graph.py': submodules

What has changed from starter code (https://github.com/skaro94/vtt_challenge_2019)

A new model has been added (Multimodal dual attention networks) Specified in 'attention_fusion.py' Files needed to train the model has been changed accordingly ('config.py', 'train.py', 'ckpt.py' etc.)

Dependency

We use python3 (3.5.2), and python2 is not supported. We use PyTorch (1.1.0), though tensorflow-gpu is necessary to launch tensorboard.

python packages: fire for commandline api

Data Folder Structure

data/
  AnotherMissOh/
    AnotherMissOh_images/
      $IMAGE_FOLDERS
    AnotherMissOh_QA/
      AnotherMissOhQA_train_set.json
      AnotherMissOhQA_val_set.json
      AnotherMissOhQA_test_set.json
      $QA_FILES
    AnotherMissOh_subtitles.json

Install

git clone --recurse-submodules (this repo)
cd $REPO_NAME/code
(use python >= 3.5)
pip install -r requirements.txt
python -m nltk.downloader 'punkt'

Place the data folder at data.

How to Use

training

cd code
python cli.py train

Access the prompted tensorboard port to view basic statistics. At the end of every epoch, a checkpoint file will be saved on /data/ckpt/OPTION_NAMES

Use video_type config option to use 'shot' or 'scene' type data.
if you want to run the code with less memory requirements, use the following flags.

python cli.py train --extractor_batch_size=$BATCH --num_workers=$NUM_WORKERS

You can use use_inputs config option to change the set of inputs to use. The default value is ['images', 'subtitle']. It is forbidden to use description input for the challenge.

For further configurations, take a look at startup/config.py and fire.

evaluation

cd code
python cli.py evaluate --ckpt_name=$CKPT_NAME

Substitute CKPT_NAME to your prefered checkpoint file. e.g. --ckpt_name=='feature*/loss_1.34'

making submissions

python cli.py infer --model_name=$MODEL_NAME --ckpt_name=$CKPT_NAME

The above command will save the outcome at the prompted location.

evaluating submissions

cd code/scripts
python eval_submission.py -y $SUBMISSION_PATH -g $DATA_PATH

Default Preprocessing Details

images are resized to 224X224 for preprocessing (resnet input size)
using last layer of resnet50 for feature extraction (base behaviour)
using glove.6B.300d for pretrained word embedding
storing image feature cache after feature extraction (for faster dataloading)
using nltk.word_tokenize for tokenization
all images for a scene questions are concatenated in a temporal order

GitHub - sally20921/MDANforDramaQA2019: 2nd place for DramaQA challenge 2019

Multimodal dual attention networks for 2019 DramaQA challenge

What has changed from starter code (https://github.com/skaro94/vtt_challenge_2019)

Dependency

Data Folder Structure

Install

How to Use

training

evaluation

making submissions

evaluating submissions

Default Preprocessing Details

Recommend

GitHub - sally20921/ConSSL: PyTorch Implementation of SOTA SSL methods

The feds are investigating Tesla’s autopilot AGAIN — here’s why

品牌占位被取缔，搜索铺“量”正当时？

产品故事#003|企业微信——面向企业端的产品的增长之路

能赚又会花！全国最会买的剁手党都有谁？这份问卷告诉你答案

宝马新款「四门轿跑」来了！溜背设计前脸大改，卖30多万但要加价抢…

Google Maps will soon introduce tolls and… price your ride

‘Algorithmic management’ makes work more stressful and less satisfying

江苏广电局打造“视听中国·创美江苏”国际传播项目

中央广播电视总台农业农村节目中心乡村振兴观察点项目启动

About Joyk