Hello,

could you please consider to backport bpo-39087: Optimize PyUnicode_AsUTF8AndSize(). by methane . Pull Request #18327 . python/cpython . GitHub to used by Enigma2 python?

This will allow to have better performance in case when you use C-acceleration module and your module use function which manipulate C-string (char*) .
In python 2.X there was no need to convert from python string to bytes, but now there is such need each time when want to use C-string.
This make acceleration obtained by using C less.
This change helps a little bit.

Please read:
Issue 39087: [C API] No efficient C API to get UTF-8 string from unicode object. - Python tracker
Assume you are writing an extension module that reads string. For example, HTML escape or JSON encode.

There are two courses:

(a) Support three KINDs in the flexible unicode representation.
(b) Get UTF-8 data from the unicode.

(a) will be the fastest on CPython, but there are few drawbacks:

* This is tightly coupled with CPython implementation. It will be slow on PyPy.
* CPython may change the internal representation to UTF-8 in the future, like PyPy.
* You can not easily reuse algorithms written in C that handle `char*`.

So I believe (b) should be the preferred way.
But CPython doesn't provide an efficient way to get UTF-8 from the unicode object.

* PyUnicode_AsUTF8AndSize(): When the unicode contains non-ASCII character, it will create a UTF-8 cache. The cache will be remained for longer than required. And there is additional malloc + memcpy to create the cache.

* PyUnicode_DecodeUTF8(): It creates bytes object even when the unicode object is ASCII-only or there is a UTF-8 cache already.
after backporting this improvements from bold sentence we can remove "memcpy" part.


and
Issue 35295: Please clarify whether PyUnicode_AsUTF8AndSize() or PyUnicode_AsUTF8String() is preferred - Python tracker


and

Issue 41784: Promote PyUnicode_AsUTF8AndSize to be available with the limited API (PEP 384) - Python tracker

and

Issue 28769: Make PyUnicode_AsUTF8 returning "const char *" rather of "char *" - Python tracker