GitHub - sally20921/MDANforDramaQA2019: 2nd place for DramaQA challenge 2019
source link: https://github.com/sally20921/MDANforDramaQA2019
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Multimodal dual attention networks for 2019 DramaQA challenge
This file contains code to conduct Drama QA with Multimodal dual attention networks. data/ (preprocessing) data loader, supports image loading, feature extraction, feature caching model/ (attention module, multimodal fusion) -'attention_fusion.py' : code for Multimodal dual attention networks model -'temporal_graph.py': submodules
What has changed from starter code (https://github.com/skaro94/vtt_challenge_2019)
A new model has been added (Multimodal dual attention networks) Specified in 'attention_fusion.py' Files needed to train the model has been changed accordingly ('config.py', 'train.py', 'ckpt.py' etc.)
Dependency
We use python3 (3.5.2), and python2 is not supported. We use PyTorch (1.1.0), though tensorflow-gpu is necessary to launch tensorboard.
python packages: fire for commandline api
Data Folder Structure
data/
AnotherMissOh/
AnotherMissOh_images/
$IMAGE_FOLDERS
AnotherMissOh_QA/
AnotherMissOhQA_train_set.json
AnotherMissOhQA_val_set.json
AnotherMissOhQA_test_set.json
$QA_FILES
AnotherMissOh_subtitles.json
Install
git clone --recurse-submodules (this repo) cd $REPO_NAME/code (use python >= 3.5) pip install -r requirements.txt python -m nltk.downloader 'punkt'
Place the data folder at data
.
How to Use
training
cd code python cli.py train
Access the prompted tensorboard port to view basic statistics.
At the end of every epoch, a checkpoint file will be saved on /data/ckpt/OPTION_NAMES
-
Use
video_type
config option to use'shot'
or'scene'
type data. -
if you want to run the code with less memory requirements, use the following flags.
python cli.py train --extractor_batch_size=$BATCH --num_workers=$NUM_WORKERS
- You can use
use_inputs
config option to change the set of inputs to use. The default value is['images', 'subtitle']
. It is forbidden to usedescription
input for the challenge.
For further configurations, take a look at startup/config.py
and
fire.
evaluation
cd code python cli.py evaluate --ckpt_name=$CKPT_NAME
Substitute CKPT_NAME to your prefered checkpoint file.
e.g. --ckpt_name=='feature*/loss_1.34'
making submissions
python cli.py infer --model_name=$MODEL_NAME --ckpt_name=$CKPT_NAME
The above command will save the outcome at the prompted location.
evaluating submissions
cd code/scripts python eval_submission.py -y $SUBMISSION_PATH -g $DATA_PATH
Default Preprocessing Details
- images are resized to 224X224 for preprocessing (resnet input size)
- using last layer of resnet50 for feature extraction (base behaviour)
- using glove.6B.300d for pretrained word embedding
- storing image feature cache after feature extraction (for faster dataloading)
- using nltk.word_tokenize for tokenization
- all images for a scene questions are concatenated in a temporal order
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK