The processor we use today does not represent values in the same way a dynamic programming language does. The processor is still made for static languages.
This makes a dynamic language slower because it has to unbox the values. The language has to convert each value from the representation used by the language into the representation used by the processor.
I think there might be a way to solve this problem that is easy to overlook: change the processor. As hard as it is to imagine, changing the processor may be inevitable. Top websites like Facebook, Reddit and Wikipedia all run on dynamic languages with no signs of this trend going away. If the future of programming is dominated by dynamic languages, a processor made for static ones won't be the best one to run them.
I want the processor to include the datatype in the value so I can tell both the datatype and the value in a single read operation. For example, I want to be able to tell a value is an integer, which takes 4 bytes, by reading only 4 bytes. I don't want to read a 5th byte to learn the datatype.
How many datatypes should there be? A dynamic language probably needs up to 16 datatypes internally to bootstrap itself. Having only 8 is too little for a few languages I looked at. Which means the datatype will need 4 bits.
Here is a simple way to avoid unboxing values using 4 bits.
C uses mostly three datatypes: char, float, int. They take up 1, 4, and 4 bytes. It's not that hard to save these datatypes with each value. Make char take 2 bytes (or trim today's char to 7 bits and make char take 1 byte); make int and float have less precision and still use 4 bytes. 
This is space efficient. Although the biggest size increase affects char, the absolute cost of 1 extra byte is low. This cost is counteracted by the benefit of not saving the datatype of int and float in a separate byte. It doesn't increase the size of int, because instead of a maximum of 2,147,483,648 (2 to the 31st) an int still stores a number that's big enough to be useful: 134,217,728 (2 to the 27th). I don't know about float, but I imagine it can be made to work. If such lower precision isn't enough, programs can use the remaining C datatypes that offer higher precision, like double and long.
It also looks like this won't hurt performance, because char, float, and int don't cross the 4-byte barrier that triggers an additional read operation from memory.
To stay backward compatible, the processor can have a separate mode for saving the datatype in the value. In static mode the processor can work the way it works now, but in dynamic mode arithmetic calculations can exclude from the value the 4 bits used for the datatype.
The compiler would need to change too to match the datatypes of the processor but in comparison changing the compiler is easy. In the end it's up to the processor to help avoid unboxing values.