Logo Search packages:      
Sourcecode: libpodofo version File versions  Download package

bool PoDoFo::PdfContentsTokenizer::ReadNext ( EPdfContentsType reType,
const char *&  rpszKeyword,
PoDoFo::PdfVariant rVariant 
)

Read the next keyword or variant, returning true and setting reType if something was read. Either rpszKeyword or rVariant, but never both, have defined and usable values on true return, with which being controlled by the value of eType.

If EOF is encountered, returns false and leaves eType, pszKeyword and rVariant undefined.

As a special case, reType may be set to ePdfContentsType_ImageData. In this case rpszzKeyword is undefined, and rVariant contains a PdfData variant containing the byte sequence between the ID and BI keywords sans the one byte of leading- and trailing- white space. No filter decoding is performed.

Parameters:
[out]reTypewill be set to either keyword or variant if true is returned. Undefined if false is returned.
[out]rpszKeywordif pType is set to ePdfContentsType_Keyword this will point to the keyword, otherwise the value is undefined. If set, the value points to memory owned by the PdfContentsTokenizer and must not be freed. The value is invalidated when ReadNext is next called or when the PdfContentsTokenizer is destroyed.
[out]rVariantif pType is set to ePdfContentsType_Variant or ePdfContentsType_ImageData this will be set to the read variant, otherwise the value is undefined.

Definition at line 105 of file PdfContentsTokenizer.cpp.

References PoDoFo::PdfTokenizer::DetermineDataType(), PoDoFo::ePdfContentsType_Keyword, PoDoFo::ePdfContentsType_Variant, PoDoFo::ePdfDataType_Array, PoDoFo::ePdfDataType_Bool, PoDoFo::ePdfDataType_Dictionary, PoDoFo::ePdfDataType_HexString, PoDoFo::ePdfDataType_Name, PoDoFo::ePdfDataType_Null, PoDoFo::ePdfDataType_Number, PoDoFo::ePdfDataType_RawData, PoDoFo::ePdfDataType_Real, PoDoFo::ePdfDataType_Reference, PoDoFo::ePdfDataType_String, PoDoFo::ePdfDataType_Unknown, PoDoFo::ePdfError_InvalidDataType, PoDoFo::PdfTokenizer::GetNextToken(), m_lstContents, m_readingInlineImgData, PODOFO_RAISE_ERROR_INFO, PoDoFo::PdfTokenizer::ReadArray(), PoDoFo::PdfTokenizer::ReadDictionary(), PoDoFo::PdfTokenizer::ReadHexString(), PoDoFo::PdfTokenizer::ReadName(), PoDoFo::PdfTokenizer::ReadString(), and SetCurrentContentsStream().

Referenced by TextExtractor::ExtractText().

{
    if (m_readingInlineImgData)
        return ReadInlineImgData(reType, rpszKeyword, rVariant);
    EPdfTokenType eTokenType;
    EPdfDataType  eDataType;
    const char*   pszToken;

    // While officially the keyword pointer is undefined if not needed, it
    // costs us practically nothing to zero it (in case someone fails to check
    // the return value and/or reType). Do so. We won't nullify the variant
    // since that has a real cost.
    //rpszKeyword = 0;

    // If we've run out of data in this stream and there's another one to read,
    // switch to reading the next stream.
    //if( m_device.Device() && m_device.Device()->Eof() && m_lstContents.size() )
    //{
    //    SetCurrentContentsStream( m_lstContents.front() );
    //    m_lstContents.pop_front();
    //}

    bool gotToken = this->GetNextToken( pszToken, &eTokenType );
    if ( !gotToken )
    {
        if ( m_lstContents.size() )
        {
        // We ran out of tokens in this stream. Switch to the next stream
        // and try again.
            SetCurrentContentsStream( m_lstContents.front() );
            m_lstContents.pop_front();
            return ReadNext( reType, rpszKeyword, rVariant );
        }
        else
        {
            // No more content stream tokens to read.
            return false;
        }
    }

    eDataType = this->DetermineDataType( pszToken, eTokenType, rVariant );

    // asume we read a variant unless we discover otherwise later.
    reType = ePdfContentsType_Variant;

    switch( eDataType )
    {
        case ePdfDataType_Null:
        case ePdfDataType_Bool:
        case ePdfDataType_Number:
        case ePdfDataType_Real:
            // the data was already read into rVariant by the DetermineDataType function
            break;

        case ePdfDataType_Reference:
        {
            // references are invalid in content streams
            PODOFO_RAISE_ERROR_INFO( ePdfError_InvalidDataType, "references are invalid in content streams" );
            break;
        }

        case ePdfDataType_Dictionary:
            this->ReadDictionary( rVariant, NULL );
            break;
        case ePdfDataType_Array:
            this->ReadArray( rVariant, NULL );
            break;
        case ePdfDataType_String:
            this->ReadString( rVariant, NULL );
            break;
        case ePdfDataType_HexString:
            this->ReadHexString( rVariant, NULL );
            break;
        case ePdfDataType_Name:
            this->ReadName( rVariant );
            break;

        case ePdfDataType_Unknown:
        case ePdfDataType_RawData:
        default:
            // Assume we have a keyword
            reType     = ePdfContentsType_Keyword;
            rpszKeyword = pszToken;
            break;
    }
    std::string idKW ("ID");
    if ((reType == ePdfContentsType_Keyword) && (idKW.compare(rpszKeyword) == 0) )
        m_readingInlineImgData = true;
    return true;
}

Here is the call graph for this function:

Here is the caller graph for this function:


Generated by  Doxygen 1.6.0   Back to index