LaVOZs

The World’s Largest Online Community for Developers

'; How to change the data type of a C++ vector in a "union-like" way - LavOzs.Com

I would like to know if it's possible in C++ to change the type of a std::vector already filled with values, exactly as a union works, i.e.:

  • not changing any single bit of the binary content
  • not computing any type casting (no mathematical operations)
  • just reinterpreting the content of binary data using a new type (ex. uint16 or float32) without any memory copy or reallocation (as I would like to use vectors of several gigabytes in size)

For example, I have a vector filled with 20 values: 0x00, 0x01, 0x02, 0x03 ... and I want to re-interpret it as a vector of 10 values, with the same overall binary content: 0x0001, 0x0203 (depending on the little endian / big endian convention)

The closest thing I could do is:

vector<uint8_t> test8(20);
uint16_t* pv16 = (uint16_t*) (&test8[0]);
vector<uint16_t> test16(pv16, pv16+10);

The result is exactly what I want, except that it makes a copy of the entire data, whereas I would like to use the existing data.

I would appreciate any help on this subject.

Thanks a lot for your answer.

You probably don't need a full-blown vector, just something that behaves like a container. You can create your own punned_view that just references the memory in the existing vector.

Please also read up on type punning and undefined behavior in C++, as it's quite a subtle topic. See https://blog.regehr.org/archives/959

#include <type_traits>
#include <cstring>
#include <cstdint>
#include <vector>

template <typename To>
class punned_view
{
    static_assert(std::is_trivial<To>::value);
    const char* begin_;
    const char* end_;
public:
    template <typename From>
    punned_view(From* begin, From* end)
        : begin_{reinterpret_cast<const char*>(begin)}
        , end_{reinterpret_cast<const char*>(end)}
    {
        static_assert(sizeof(To) >= sizeof(From)); // exercise to make it work with smaller types too
        static_assert(std::is_trivial<From>::value);
        // add checks that size is a multiple of To here
    }

    std::size_t size() const noexcept
    {
        return (end_ - begin_) / sizeof(To);
    }

    class const_iterator
    {
        const char* current_;
    public:
        const_iterator(const char* current)
            : current_{current}
        { }

        const_iterator& operator++() noexcept
        {
            current_ += sizeof(To);
            return *this;
        }
        To operator*() const noexcept
        { 
            To result;
            // only legal way to type pun in C++
            std::memcpy(&result, current_, sizeof(result));
            return result;
        }
        bool operator != (const_iterator other) const noexcept
        {
            return current_ != other.current_;
        }
    };

    const_iterator begin() const noexcept { return {begin_}; }
    const_iterator end() const noexcept { return {end_}; }
};

uint16_t sum_example(const std::vector<uint8_t>& vec)
{
    punned_view<uint16_t> view{vec.data(), vec.data() + vec.size()};

    uint16_t sum = 0;
    for (uint16_t v : view)
        sum += v;

    return sum;
}

and thank you for all your quick and detailed answers. I was nicely surprised as the last time I used a forum (Eclipse) I remember getting exactly zero answers after an entire month...

Anyway, before I can try to test the different solutions you suggested, I wanted first to react to the excellent point rose by David Schwartz: yes my question is definitely a XY question, and yes I completely omitted to mention the context that led me to this exotic situation and what my real need ares.

So to make a long story short, what I really want is to read the content of a tiff image (satellite image with only gray-scale values, no RGB or any color combination) using gdal in C++, then perform some simple operations, some of them as basic as getting the right pixels values. Sounds simple as hell, doesn't it ? Now in real life everything is a nightmare when using gdal (which is as powerful as cryptic) and NOT knowing beforehand the actual pixel data type (which could be basically any kind of int or floating-point with any precision). As far as I could understand with tutorials, examples and forum, gdal offers me only 2 (hardly satisfactory) ways of reading the content of a tiff image:

1) either I know exactly the pixel datatype of my image (ex int16), and I have to hardcode it somewhere, which I cannot afford (and templates would not help here, as at a certain point I have to store the content of my image into a variable, which means I must know its precise type).

2) or I can read an image of any pixel data type but using a automatic conversion into a given target type (ex float64 to cover all possible value ranges). Sounds convenient and easy, but the downside is that this systematic conversion is a potentially huge waste of time and memory (think of uint8 in source array converted into float64 in target array!). An insane option for me as I usually work with massively big images (like several giga-pixels!)

3) I kind of figured out by myself a ugly/clumsy alternate solution, where I let gdal load the image content in a kind of "raw binary" content (officially an array of bytes) then eventually try to read it back by interpreting it according to the real datatype (that gdal can tell me afterwards). The good side is that the exact binary content of the image is loaded with no conversion whatsoever, so best speed and memory usage. The downside is that I end up eventually trying to fiddle with this binary data in order to interpret it correctly, avoiding any copy or mathematical operations.

So that's what led me into this awkward attempt of "in-place re-interpretation" of my data, or whatever the proper name is, just because I thought it would be a very simple and final step to getting the job done, but I might be wrong, and I might have overlooked simpler/cleaner solutions (actually I wish I have!).

Some final thoughts in order to "de-Y" my XY question !!!

_ using gdal library seems almost mandatory here, for as far as I know it is the only library that can handle properly the kind of image I am dealing with, i.e. multi-band tiff images (other libraries typically always consider 3 bands and interpret them blindly as RGB color components, which is absolutely not what I want here).

_ also I gave it a quick try with gdal for python but handling gigapixel large images in python sounds definitely like a wrong choice. Moreover my next step here should be to make a basic interactive image viewer (probably using Qt), so execution speed really matters.

_ I mentioned a lot using std::vector because I thought it would be easier to play with, but probably old-school C array would do the job.

_ finally I saw many answers mentioning alignment issue, that's really something I am not so comfortable with and that I wouldn't like to mess with...

So again, any further advice is welcome, including throwing away some of my previous attempts if it can simplify the situation and come out with a more direct solution, which is really something I would dream of.

Thanks again.

To get the data as another type, this could be achieved with some pointers and cast- magic. From c++11 and later you can get a pointer to the raw data of a std::vector http://www.cplusplus.com/reference/vector/vector/data/

void* p;
uint16_t* p2;

std::vector<uint32_t> myvector;
myvector.push_back(0x12345678);
myvector.push_back(400);

p=myvector.data();
p2 = (uint16_t*)p;

for (size_t i = 0; i < 2*myvector.size(); i++) {
    std::cout << *p2++ <<",";
}

As always when using casts of pointers you tell the compiler that you know better than it how to use and interpret the data and it will happily permit you to ignore alignment and endianess and do all harm you care with it.

Related
What are POD types in C++?
What's the canonical way to check for type in Python?
How can I profile C++ code running on Linux?
How to determine a Python variable's type?
How to find out if an item is present in a std::vector?
What is the easiest way to initialize a std::vector with hardcoded elements?
How to check if type of a variable is string?
Easiest way to convert int to string in C++
C++11 introduced a standardized memory model. What does it mean? And how is it going to affect C++ programming?
Change data type of columns in Pandas