Skip to Content [alt-c]

Andrew Ayer

Sections

← Why Do Hackers Love Namecheap and Hate Name.com?

Running a Robust NTP Daemon →

March 2, 2013

GCC's Implementation of basic_istream::ignore() is Broken

The implementation of std::basic_istream::ignore() in GCC's C++ standard library suffers from a serious flaw. After ignoring the n characters as requested, it checks to see if end-of-file has been reached. If it has, then the stream's eofbit is set. The problem is that to check for end-of-file, ignore() has to essentially peek ahead in the stream one character beyond what you've ignored. That means that if you ask to ignore all the characters currently available in the stream buffer, ignore() causes an underflow of the buffer. If it's a file stream, the buffer can be refilled by reading from the filesystem in a finite amount of time, so this is merely inefficient. But if it's a socket, this underflow can be fatal: your program may block forever waiting for bytes that never come. This is horribly unintuitive and is inconsistent with the behavior of std::basic_istream::read(), which does not check for end-of-file after reading the requested number of characters.

The origin of this problem is that the C++ standard is perhaps not as clear as it should be regarding ignore(). From section 27.7.2.3:

basic_istream<charT,traits>& ignore(streamsize n = 1, int_type delim = traits::eof());

Effects: Behaves as an unformatted input function (as described in 27.7.2.3, paragraph 1). After constructing a sentry object, extracts characters and discards them. Characters are extracted until any of the following occurs:

  • if n != numeric_limits<streamsize>::max() (18.3.2), n characters are extracted
  • end-of-file occurs on the input sequence (in which case the function calls setstate(eofbit), which may throw ios_base::failure (27.5.5.4));
  • traits::eq_int_type(traits::to_int_type(c), delim) for the next available input character c (in which case c is extracted).

Note that the Standard does not specify the order in which the checks should be performed, suggesting that a conformant implementation may check for end-of-file before checking if n characters have been extracted, as GCC does. You may think that the order is implicit in the ordering of the bullet points, but if it were, then why would the Standard explicitly state the order in the case of getline()? From section 27.7.2.3:

basic_istream<charT,traits>& getline(char_type* s, streamsize n, char_type delim);

Effects: Behaves as an unformatted input function (as described in 27.7.2.3, paragraph 1). After constructing a sentry object, extracts characters and stores them into successive locations of an array whose first element is designated by s. Characters are extracted and stored until one of the following occurs:

  1. end-of-file occurs on the input sequence (in which case the function calls setstate(eofbit));
  2. traits::eq(c, delim) for the next available input character c (in which case the input character is extracted but not stored);
  3. n is less than one or n - 1 characters are stored (in which case the function calls setstate(failbit)).

These conditions are tested in the order shown.

At least this is one GCC developer's justification for GCC's behavior. However, I have a different take: I believe that the only way to satisfy the Standard's requirements for ignore() is to perform the checks in the order presented. The Standard says that "characters are extracted until any of the following occurs." That means that when n characters have been extracted, ignore() needs to terminate, since this condition is among "any of the following." But, if ignore() first checks for end-of-file and blocks forever, then it doesn't terminate. This constrains the order in which a conformant implementation can check the conditions, and is perhaps why the Standard does not need to specify an explicit order here, but does for getline() where it really does want the end-of-file check to occur first.

I have left a comment on the GCC bug stating my interpretation. One problem with fixing this bug is that it will break code that has come to depend on eofbit being set if you ignore all the data remaining on a stream, though I'm frankly skeptical that much code would make that assumption. Also, both LLVM's libcxx and Microsoft Visual Studio (version 2005, at least) implement ignore() according to my interpretation of the Standard.

In the meantime, be very, very careful with your use of ignore(). Only use it on file streams or when you know you'll be ignoring fewer characters than are available to be read. And don't rely on eofbit being set one way or the other.

If you need a more reliable version of ignore(), I've written a non-member function implementation which takes a std::basic_istream as its first argument. It is very nearly a drop-in replacement for the member function (it even properly throws exceptions depending on the stream's exceptions mask), except that it returns the number of bytes ignored (not a reference to the stream) in lieu of making the number of bytes available by a call to gcount(). (It's not possible for a non-member function to set the value returned by gcount().)

Posted on 2013-03-02 at 21:54:48 UTC

← Why Do Hackers Love Namecheap and Hate Name.com?

Running a Robust NTP Daemon →

Hi, I'm Andrew. I'm the founder of SSLMate, which makes SSL certificates easy through automation, great software, and friendly support.

I blog about security, PKI, Linux, and more. If you liked this post, check out my other posts or subscribe to my Atom feed.

My email address is andrew@agwa.name. I'm AGWA at GitHub and @__agwa on Twitter.

Comments

No comments yet.

Post a Comment

Your comment will be public. If you would like to contact me privately, please email me. Please keep your comment on-topic, polite, and comprehensible. Use the "Preview" button to make sure your comment is properly formatted. Name and email address are optional. If you specify an email address it will be kept confidential.

Post Comment


(Optional; will be published)


(Optional; will not be published)


(Optional; will be published)


  • Blank lines separate paragraphs.
  • Lines starting with ">" are indented as block quotes.
  • Lines starting with two spaces are reproduced verbatim.
  • Text surrounded by *asterisks* is italicized.
  • Text surrounded by `back ticks` is monospaced.
  • URLs are turned into links.