6

Minimal tutorial on packing (pack_padded_sequence) and unpacking (pad_packed_seq...

 1 year ago
source link: https://gist.github.com/HarshTrivedi/f4e7293e941b17d19058f6fb90ab0fec
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Instantly share code, notes, and snippets.

Minimal tutorial on packing (pack_padded_sequence) and unpacking (pad_packed_sequence) sequences in pytorch.

please how to get output without padded

al025 commented Mar 10, 2020

Thanks, the visualization for packed_output and unpacked_output is really helpful!

Great tutorial! Easy to grasp the concepts on packing and padding.

This is great!!!

Great tutorial!
A question: in which situations we might want to use pad_packed_sequence? When calculating loss, wouldn't it be simpler to work with packed (without pads) scores (LSTM outputs) and packed targets? Or do I need/have to sometimes pad scores and targets using pad_packed_sequence? If yes, then when is it used?

Thank you very much!

Great work!

Thanks a lot for putting this together.

RudRho commented Jul 20, 2020

Line#146 is the icing on the cake.

Awesome!

Great work!

Great work.

Thanks a lot! +1

thank you, it is very helpful!

This is great! Congratulation

Bro where did the len object in line 51 come from?

Perfectly explained! Was always confused on what data goes into the batch.

laifi commented Nov 19, 2020

Why sort instances by sequence length in descending order step is needed?

pack_padded_sequence does not need sorting anymore,its a parameter in the function (Doc)

**enforce_sorted** (bool, optional) –if True, the input is expected to contain sequences sorted by length in a decreasing order. If False, the input will get sorted unconditionally. Default: True.

rayryeng commented Nov 22, 2020

edited

@jackfrost29 - len is a built-in method in classes. When calling len, it accesses the __len__ method for whatever object is used as input. The usual understanding with len is that it finds the length / size of whatever object you pass to it. In this case, the object is a list of token lists so it finds the length of every token list in vectorized_seqs.

Wonder why nobody complains about lines 120-138, as the packed sequence is clearly wrong.

Clearly, the first three rows in the packed sequence are not l, m, t but l, u, s for example. There are also too many closing brackets in line 132.

Pretty helpful, thank you

Thankyou very much.It's a very important paper.

you sort them, then you need back to original position right? I want to use a hidden state, is that right?
''' a_lengths, idx = text_length.sort(0, descending=True)
_, un_idx = t.sort(idx, dim=0)
seq = text[idx]

    seq = self.dropout(self.embedding(seq))

    a_packed_input = t.nn.utils.rnn.pack_padded_sequence(input=seq, lengths=a_lengths.to('cpu'), batch_first=True)
    packed_output, (hidden, cell) = self.rnn(a_packed_input)
    out, _ = t.nn.utils.rnn.pad_packed_sequence(packed_output, batch_first=True)
    hidden = self.dropout(t.cat((hidden[-2, :, :], hidden[-1, :, :]), dim=1))

    hidden = t.index_select(hidden, 0, un_idx)

just what i was looking for, thanks

elch10 commented Dec 30, 2021

I can't find any performance comparision. Did anyone compare using pack_padded_sequence with just padded sequence?

Y-jiji commented Mar 19, 2022

Why sort instances by sequence length in descending order step is needed?

If you want to export this model as ONNX, enforce_sorted option must be True.
However, if this model is not to be used in production, you can set enforce_sorted=False to avoid sorting.

Superb!

Very helpful!

hungkien05 commented Jul 7, 2022

edited

Most easy-to-understand explanation I have read !

awnsome !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK