4

On Input Formatting and How Lists are Represented in Input Files

 1 year ago
source link: https://codeforces.com/blog/entry/113262
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client
By xiaowuc1, 41 hour(s) ago, In English

When preparing the problems for the February 25th NA ICPC regionals, we had some discussion around how lists should be formatted when given as input to contestants.

The primary argument motivating some design decisions was consistency — namely that all lists should be formatted the same way, where a list should be expressed by starting with the length of the list, , and then writing lines, one item per line.

This is pretty standard and for most problems, this is the convention that most problems use for most lists. The main counterexample to this case seems to be when the list is a list of integers. A lot of problems, when confronted with a list of integers, will write out the list as one line of space-separated integers. This seems to be the meta for most contests, though I have not looked very carefully at this so this assumption may be wrong. NA ICPC is one of the contests that generally follows line-delimited lists of integers.

Here's my question to the competitive programming community at large — as a contestant, how do you feel about input formatting of this form?

  • Consistency is the most important thing, so all lists should be written as followed by lines, one per object.

    3

  • Consistency is not that important. For complicated lists where the object is not just a single string or an integer, I prefer lists to be written as followed by lines, one per object. However, for lists of integers or strings, I prefer on one line followed by another line with space-separated tokens.

    88

  • Consistency is not that important, and I don't care how lists of integers are written, since I'll just read them the same way no matter what.

    19

  • I don't believe that lists by default should be written as followed by lines, one per object.

    6

I also want to ask a similarly related question — as a contestant, have you thought about consistency of input formats in programming contests?

  • Yes, and it bothers me when contests are internally inconsistent with regards to reading until EOF or a sentinel, or having a prespecified .

    60

  • Yes, and it bothers me when different contests follow different formats.

    1

  • Yes, but I don't really care.

    15

  • No.

    33

40 hours ago, # |

Rev. 4  

+141

i move to abolish 'end of list denoted by 0/-1/eof'

on that note, i personally think one object per line for, say, a list of integers, is very hard to read as a human, even more so with multicase. consider the following 3 lists:

3
1
3
2
2
1
1
1
4

34 hours ago, # |

Rev. 2  

+22

I think for integers it's nice to have all in the same line, as it requires less vertical space and the statement, and also it's nice to read it in Python like list(map(int, input().split()))? It is also visually more distinguishable if the input has more than one list.

Unless it is subtests with one integer each, in which case each subcase should generally start from a new line.

15 hours ago, # |

I have coordinated many contests and my position is strictly anti-consistency. My policy is:

  • If you are in an organizing team, never talk about consistency and grammar unless someone asked.
  • If you are truly bothered and can't sleep because of it, First — do it only if you've finished your duties. Second — change it without wasting others' time. Third — inform them.
  • Express gratefulness for your teammate who generously allowed you to change the grammar and consistency rules. Some people think the opposite way. It isn't.

(The point is not about grammar, the point is about nitpicking. For example, I am not a native English speaker, so my English actually have serious grammar issues. In this case it's not a nitpick, it's an issue. In this case I requested xiaowuc1 to review my grammars.)

Contests exist to demonstrate problems — statements are good if and only if people understand them clearly. There exists no other goal than this. The problem with consistency is:

  • It's just a waste of time. People waste so much time on consistency, grammar and all nitpicking bullshit. But after the contests we have weak tests and unprepared editorials, and people still don't understand that grammatically-perfect statements. Why does that happen? Statement testing is not a grammar exam. It should be about thinking from the solver's perspective and clarifying what people might not understand. If you did all your duties and have free time, do check grammar and consistency, but honestly, in almost all contests we are out of time. It's simply a very sub-optimal way to increase user experience. Understand and do what participants really need.
  • It doesn't create better problemset. A consistent way is not always the best way to solve issues. The case in the article is already a good example. Another example — I was in a team which prepares IOI TST problems (all function implementation). Someone wanted to put function invocation in texttt () and all function variables in braces (). This is a reasonable idea. But there was a problem where the function received an argument with an underscore (). It made the statement look ugly. I decide to write it as . What is the consistent way? I don't give a fuck, I just like this way of underscore.
  • Flame wars. Different people work as a team and prepare different problems. They all have their own standards and ideas on how to format inputs and statements. Consistency means some people's standards should be ignored. That could happen, but sometimes they fight to defend their genius ideas on spaces versus newlines. Please, shut up! Let others do what they want, and do your thing.

Of course, I'm not against all nitpicking. For example, some grammar mistakes cause confusion. It's a bad expression and should be fixed. But if the expression is grammatically wrong but unambiguous, it is not an issue. It's okay to have such issues even in the final version.

There could be some contestants who are bothered by the internal inconsistency in a single problemset. But as long as it does not make them do mistakes (e.g mod 998244353 and 1000000007 in a single problem set can prompt a mistake, it should be better avoided if possible), those tickling feelings are exactly the contestant's problem, not the contest's problem. Contests should do the right thing even though people may not like it.

10 hours ago, # |

Can your contestants read the samples without pain? Consider the following:

2
3
1
1
2
3
1
1
5

Number of test cases, number of elements in the list, one per line... except there's nothing to guide the eyes to immediately distinguish that. The best course of action is to write it down on paper in a better way and never look at the original statement again.

Consider also that consistency among all possible lists together is actually inconsistent because lists of different objects naturally behave differently. A list of lists should be one per line, a list of integers should be in one line, for strings either is fine, for a general 3d list there are no good options since we're constrained to a 2d medium. In a lot of specific cases like a list of lists of strings or binary values, you also intuitively know what to do.

You should aim to be consistent in things like keeping the same type of input formatting for the same type of input, put the length before every list (or don't, but doing it is more convenient), use the same input validation tools everywhere etc.

105 minutes ago, # |

Can anyone who prefers newline separation justify why this format is better than space separation? I prefer space separation for pretty much the same reason others have described--it's vastly easier to read when working out sample cases on paper. Perhaps this makes things marginally easier for users of other languages, but off the top of my head I can't think of nontrivial benefits to Python/Java users.

Also, big +1 to ko_osaga's comment. This decision does not seem sufficiently high-impact to justify spending too much energy arguing over it.

17 minutes ago, # |

This is an interesting question.

And people in the comments argue that there are matters more important for them than consistency. One is readability. Another is cost vs. benefit of making stuff consistent.

Which brings the question: why is consistency important? The original post just states that consistency was the primary argument motivating some design decisions. But it does not say why.

So, why? I'd much like to see that spelt out as well.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK