qiskit/setup.py

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

56 lines
2.0 KiB
Python
Raw Normal View History

# This code is part of Qiskit.
Revise travis configuration, using cmake * Revise the travis configuration for using `cmake` for the several targets, and use "stages" instead of parallel jobs: * define three stages that are executed if the previous one suceeds: 1. "linter and pure python test": executes the linter and a test without compiling the binaries, with the idea of providing quick feedback for PRs. 2. "test": launch the test, including the compilation of binaries, under GNU/Linux Python 3.6 and 3.6; and osx Python 3.6. 3. "deploy doc and pypi": for the stable branch, deploy the docs to the landing page, and when using a specific commit message, build the GNU/Linux and osx wheels, uploading them to test.pypi. * use yaml anchors and definitions to avoid repeating code (and working around travis limitations). * Modify the `cmake``configuration to accomodate the stages flow: * allow conditional creation of compilation and QA targets, mainly for saving some time in some jobs. * move the tests to `cmake/tests.cmake`. * Update the tests: * add a `requires_qe_access` decorator that retrieves QE_TOKEN and QE_URL and appends them to the parameters in an unified manner. * add an environment variable `SKIP_ONLINE_TESTS` that allows to skip the tests that need network access. * replace `TRAVIS_FORK_PULL_REQUEST` with the previous two mechanisms, adding support for AppVeyor as well. * fix a problem with matplotlib under osx headless, effectively skipping `test_visualization.py` during the travis osx jobs. * Move Sphinx to `requirements-dev.txt`.
2018-02-13 05:11:28 +08:00
#
# (C) Copyright IBM 2017.
#
# This code is licensed under the Apache License, Version 2.0. You may
# obtain a copy of this license in the LICENSE.txt file in the root directory
# of this source tree or at http://www.apache.org/licenses/LICENSE-2.0.
#
# Any modifications or derivative works of this code must retain this
# copyright notice, and modified files need to carry a notice indicating
# that they have been altered from the originals.
Revise travis configuration, using cmake * Revise the travis configuration for using `cmake` for the several targets, and use "stages" instead of parallel jobs: * define three stages that are executed if the previous one suceeds: 1. "linter and pure python test": executes the linter and a test without compiling the binaries, with the idea of providing quick feedback for PRs. 2. "test": launch the test, including the compilation of binaries, under GNU/Linux Python 3.6 and 3.6; and osx Python 3.6. 3. "deploy doc and pypi": for the stable branch, deploy the docs to the landing page, and when using a specific commit message, build the GNU/Linux and osx wheels, uploading them to test.pypi. * use yaml anchors and definitions to avoid repeating code (and working around travis limitations). * Modify the `cmake``configuration to accomodate the stages flow: * allow conditional creation of compilation and QA targets, mainly for saving some time in some jobs. * move the tests to `cmake/tests.cmake`. * Update the tests: * add a `requires_qe_access` decorator that retrieves QE_TOKEN and QE_URL and appends them to the parameters in an unified manner. * add an environment variable `SKIP_ONLINE_TESTS` that allows to skip the tests that need network access. * replace `TRAVIS_FORK_PULL_REQUEST` with the previous two mechanisms, adding support for AppVeyor as well. * fix a problem with matplotlib under osx headless, effectively skipping `test_visualization.py` during the travis osx jobs. * Move Sphinx to `requirements-dev.txt`.
2018-02-13 05:11:28 +08:00
"The Qiskit setup file."
import os
from setuptools import setup
Implement multithreaded stochastic swap in rust (#7658) * Implement multithreaded stochastic swap in rust This commit is a rewrite of the core swap trials functionality in the StochasticSwap transpiler pass. Previously this core routine was written using Cython (see #1789) which had great performance, but that implementation was single threaded. The core of the stochastic swap algorithm by it's nature is well suited to be executed in parallel, it attempts a number of random trials and then picks the best result from all the trials and uses that for that layer. These trials can easily be run in parallel as there is no data dependency between the trials (there are shared inputs but read-only). As the algorithm generally scales exponentially the speed up from running the trials in parallel can offset this and improve the scaling of the pass. Running the pass in parallel was previously tried in #4781 using Python multiprocessing but the overhead of launching an additional process and serializing the input arrays for each trial was significantly larger than the speed gains. To run the algorithm efficiently in parallel multithreading is needed to leverage shared memory on shared inputs. This commit rewrites the cython routine using rust. This was done for two reasons. The first is that rust's safety guarantees make dealing with and writing parallel code much easier and safer. It's also multiplatform because the rust language supports native threading primatives in language. The second is while writing parallel cython code using open-mp there are limitations with it, mainly on windows. In practice it was also difficult to write and maintain parallel cython code as it has very strict requirements on python and c code interactions. It was much faster and easier to port it to rust and the performance for each iteration (outside of parallelism) is the same (in some cases marginally faster) in rust. The implementation here reuses the data structures that the previous cython implementation introduced (mainly flattening all the terra objects into 1d or 2d numpy arrays for efficient access from C). The speedups from this PR can be significant, calling transpile() on a 400 qubit (with a depth of 10) QV model circuit targetting a 409 heavy hex coupling map goes from ~200 seconds with the single threaded cython to ~60 seconds with this PR locally on a 32 core system, When transpiling a 1000 qubit (also with a depth of 10) QV model circuit targetting a 1081 qubit heavy hex coupling map goes from taking ~6500 seconds to ~720 seconds. The tradeoff with this PR is for local qiskit-terra development a rust compiler needs to be installed. This is made trivial using rustup (https://rustup.rs/), but it is an additional burden and one that we might not want to make. If so we can look at turning this PR into a separate repository/package that qiskit-terra can depend on. The tradeoff here is that we'll be adding friction to the api boundary between the pass and the core swap trials interface. But, it does ease the dependency on development for qiskit-terra. * Sanitize packaging to support future modules This commit fixes how we package the compiled rust module in qiskit-terra. As a single rust project only gives us a single compiled binary output we can't use the same scheme we did previously with cython with a separate dynamic lib file for each module. This shifts us to making the rust code build a `qiskit._accelerate` module and in that we have submodules for everything we need from compiled code. For this PR there is only one submodule, `stochastic_swap`, so for example the parallel swap_trials routine can be imported from `qiskit._accelerate.stochastic_swap.swap_trials`. In the future we can have additional submodules for other pieces of compiled code in qiskit. For example, the likely next candidate is the pauli expectation value cython module, which we'll likely port to rust and also make parallel (for sufficiently large number of qubits). In that case we'd add a new submodule for that functionality. * Adjust random normal distribution to use correct mean This commit corrects the use of the normal distribution to have the mean set to 1.0. Previously we were doing this out of band for each value by adding 1 to the random value which wasn't necessary because we could just generate it with a mean of 1.0. * Remove unecessary extra scope from locked read This commit removes an unecessary extra scope around the locked read for where we store the best solution. The scope was previously there to release the lock after we check if there is a solution or not. However this wasn't actually needed as we can just do the check inline and the lock will release after the condition block. * Remove unecessary explicit type from opt_edges variable * Fix indices typo in NLayout constructor Co-authored-by: Jake Lishman <jake@binhbar.com> * Remove explicit lifetime annotation from swap_trials Previously the swap_trials() function had an explicit lifetime annotation `'p` which wasn't necessary because the compiler can determine this on it's own. Normally when dealing with numpy views and a Python object (i.e. a GIL handle) we need a lifetime annotation to tell the rust compiler the numpy view and the python gil handle will have the same lifetime. But since swap_trials doesn't take a gil handle and operates purely in rust we don't need this lifetime and the rust compiler can deal with the lifetime of the numpy views on their own. * Use sum() instead of fold() * Fix lint and add rust style and lint checks to CI This commit fixes the python lint failures and also updates the ci configuration for the lint job to also run rust's style and lint enforcement. * Fix returned layout mapping from NLayout This commit fixes the output list from the `layout_mapping()` method of `NLayout`. Previously, it incorrectly would return the wrong indices it should be a list of virtual -> physical to qubit pairs. This commit corrects this error Co-authored-by: georgios-ts <45130028+georgios-ts@users.noreply.github.com> * Tweak tox configuration to try and reliably build rust extension * Make swap_trials parallelization configurable This commit makes the parallelization of the swap_trials() configurable. This is dones in two ways, first a new argument parallel_threshold is added which takes an optional int which is the number of qubits to switch between a parallel and serial version. The second is that it takes into account the the state of the QISKIT_IN_PARALLEL environment variable. This variable is set to TRUE by parallel_map() when we're running in a multiprocessing context. In those cases also running stochastic swap in parallel will likely just cause too much load as we're potentially oversubscribing work to the number of available CPUs. So, if QISKIT_IN_PARALLEL is set to True we run swap_trials serially. * Revert "Make swap_trials parallelization configurable" This reverts commit 57790c84b03da10fd7296c57b38b54c5bccebf4c. That commit attempted to sovle some issues in test running, mainly around multiple parallel dispatch causing exceess load. But in practice it was broken and caused more issues than it fixed. We'll investigate and add control for the parallelization in a future commit separately after all the tests are passing so we have a good baseline. * Add docs to swap_trials() and remove unecessary num_gates arg * Fix race condition leading to non-deterministic behavior Previously, in the case of circuits that had multiple best possible depth == 1 solutions for a layer, there was a race condition in the fast exit path between the threads which could lead to a non-deterministic result even with a fixed seed. The output was always valid, but which result was dependent on which parallel thread with an ideal solution finished last and wrote to the locked best result last. This was causing weird non-deterministic test failures for some tests because of #1794 as the exact match result would change between runs. This could be a bigger issue because user expectations are that with a fixed seed set on the transpiler that the output circuit will be deterministically reproducible. To address this is issue this commit trades off some performance to ensure we're always returning a deterministic result in this case. This is accomplished by updating/checking if a depth==1 solution has been found in another trial thread we only act (so either exit early or update the already found depth == 1 solution) if that solution already found has a trial number that is less than this thread's trial number. This does limit the effectiveness of the fast exit, but in practice it should hopefully not effect the speed too much. As part of this commit some tests are updated because the new deterministic behavior is slightly different from the previous results from the cython serial implementation. I manually verified that the new output circuits are still valid (it also looks like the quality of the results in some of those cases improved, but this is strictly anecdotal and shouldn't be taken as a general trend with this PR). * Apply suggestions from code review Co-authored-by: georgios-ts <45130028+georgios-ts@users.noreply.github.com> * Fix compiler errors in previous commit * Revert accidental commit of parallel reduction in compute_cost This was only a for local testing to prove it was a bad idea and was accidently included in the branch. We should not nest the parallel execution like this. * Eliminate short circuit for depth == 1 swap_trial() result This commit eliminates the short circuit fast return in swap_trial() when another trial thread has found an ideal solution. Trying to do this in a parallel context is tricky to make deterministic because in cases of >1 depth == 1 solutions there is an inherent race condition between the threads for writing out their depth == 1 result to the shared location. Different strategies were tried to make this reliably deterministic but there wa still a race condition. Since this was just a performance optimization to avoid doing unnecessary work this commit removes this step. Weighing improved performance against repeatability in the output of the compiler, the reproducible results are more important. After we've adopted a multithreaded stochastic swap we can investigate adding this back as a potential future optimization. * Add missing docstrings * Add section to contributing on installing form source * Make rust python classes pickleable * Add rust compiler install to linux wheel jobs * Try more tox changes to fix docs builds * Revert "Eliminate short circuit for depth == 1 swap_trial() result" This reverts commit c510764a770cb610661bdb3732337cd45ab587fd. The removal there was premature and we had a fix for the non-determinism in place, ignoring a typo which was preventing it from working. Co-Authored-By: Georgios Tsilimigkounakis <45130028+georgios-ts@users.noreply.github.com> * Fix submodule declaration and module attribute on rust classes * Fix rust lint * Fix docs job definition * Disable multiprocessing parallelism in unit tests This commit disables the multiprocessing based parallelism when running unittest jobs in CI. We historically have defaulted the use of multiprocessing in environments only where the "fork" start method is available because this has the best performance and has no caveats around how it is used by users (you don't need an `if __name__ == "__main__"` guard). However, the use of the "fork" method isn't always 100% reliable (see https://bugs.python.org/issue40379), which we saw on Python 3.9 #6188. In unittest CI (and tox) by default we use stestr which spawns (not using fork) parallel workers to run tests in parallel. With this PR this means in unittest we're now running multiple test runner subprocesses, which are executing parallel dispatched code using multiprocessing's fork start method, which is executing multithreaded rust code. This three layers of nesting is fairly reliably hanging as Python's fork doesn't seem to be able to handle this many layers of nested parallelism. There are 2 ways I've been able to fix this, the first is to change the start method used by `parallel_map()` to either "spawn" or "forkserver" either of these does not suffer from random hanging. However, doing this in the unittest context causes significant overhead and slows down test executing significantly. The other is to just disable the multiprocessing which fixes the hanging and doesn't impact runtime performance signifcantly (and might actually help in CI so we're not oversubscribing the limited resources. As I have not been able to reproduce `parallel_map()` hanging in a standalone context with multithreaded stochastic swap this commit opts for just disabling multiprocessing in CI and documenting the known issue in the release notes as this is the simpler solution. It's unlikely that users will nest parallel processes as it typically hurts performance (and parallel_map() actively guards against it), we only did it in testing previously because the tests which relied on it were a small portion of the test suite (roughly 65 tests) and typically did not have a significant impact on the total throughput of the test suite. * Fix typo in azure pipelines config * Remove unecessary extension compilation for image tests * Add test script to explicitly verify parallel dispatch In an earlier commit we disabled the use of parallel dispatch in parallel_map() to avoid a bug in cpython associated with their fork() based subprocess launch. Doing this works around the bug which was reliably triggered by running multiprocessing in parallel subprocesses. It also has the side benefit of providing a ~2x speed up for test suite execution in CI. However, this meant we lost our test coverage in CI for running parallel_map() with actual multiprocessing based parallel dispatch. To ensure we don't inadvertandtly regress this code path moving forward this commit adds a dedicated test script which runs a simple transpilation in parallel and verifies that everything works as expected with the default parallelism settings. * Avoid multi-threading when run in a multiprocessing context This commit adds a switch on running between a single threaded and a multithreaded variant of the swap_trials loop based on whether the QISKIT_IN_PARALLEL flag is set. If QISKIT_IN_PARALLEL is set to TRUE this means the `parallel_map()` function is running in the outer python context and we're running in multiprocessing already. This means we do not want to be running in multiple threads generally as that will lead to potential resource exhaustion by spawn n processes each potentially running with m threads where `n` is `min(num_phys_cpus, num_tasks)` and `m` is num_logical_cpus (although only `min(num_logical_cpus, num_trials)` will be active) which on the typical system there aren't enough cores to leverage both multiprocessing and multithreading. However, in case a user does have such an environment they can set the `QISKIT_FORCE_THREADS` env variable to `TRUE` which will use threading regardless of the status of `QISKIT_IN_PARALLEL`. * Apply suggestions from code review Co-authored-by: Jake Lishman <jake@binhbar.com> * Minor fixes from review comments This commits fixes some minor details found during code review. It expands the section on building from source to explain how to build a release optimized binary with editable mode, makes the QISKIT_PARALLEL env variable usage consistent across all jobs, and adds a missing shebang to the `install_rush.sh` script which is used to install rust in the manylinux container environment. * Simplify tox configuration In earlier commits the tox configuration was changed to try and fix the docs CI job by going to great effort to try and enforce that setuptools-rust was installed in all situations, even before it was actually needed. However, the problem with the docs ci job was unrelated to the tox configuration and this reverts the configuration to something that works with more versions of tox and setuptools-rust. * Add missing pieces of cargo configuration Co-authored-by: Jake Lishman <jake@binhbar.com> Co-authored-by: georgios-ts <45130028+georgios-ts@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-03-01 05:49:54 +08:00
from setuptools_rust import Binding, RustExtension
# Most of this configuration is managed by `pyproject.toml`. This only includes the extra bits to
# configure `setuptools-rust`, because we do a little dynamic trick with the debug setting, and we
# also want an explicit `setup.py` file to exist so we can manually call
#
# python setup.py build_rust --inplace --release
#
# to make optimized Rust components even for editable releases, which would otherwise be quite
# unergonomic to do otherwise.
Implement multithreaded stochastic swap in rust (#7658) * Implement multithreaded stochastic swap in rust This commit is a rewrite of the core swap trials functionality in the StochasticSwap transpiler pass. Previously this core routine was written using Cython (see #1789) which had great performance, but that implementation was single threaded. The core of the stochastic swap algorithm by it's nature is well suited to be executed in parallel, it attempts a number of random trials and then picks the best result from all the trials and uses that for that layer. These trials can easily be run in parallel as there is no data dependency between the trials (there are shared inputs but read-only). As the algorithm generally scales exponentially the speed up from running the trials in parallel can offset this and improve the scaling of the pass. Running the pass in parallel was previously tried in #4781 using Python multiprocessing but the overhead of launching an additional process and serializing the input arrays for each trial was significantly larger than the speed gains. To run the algorithm efficiently in parallel multithreading is needed to leverage shared memory on shared inputs. This commit rewrites the cython routine using rust. This was done for two reasons. The first is that rust's safety guarantees make dealing with and writing parallel code much easier and safer. It's also multiplatform because the rust language supports native threading primatives in language. The second is while writing parallel cython code using open-mp there are limitations with it, mainly on windows. In practice it was also difficult to write and maintain parallel cython code as it has very strict requirements on python and c code interactions. It was much faster and easier to port it to rust and the performance for each iteration (outside of parallelism) is the same (in some cases marginally faster) in rust. The implementation here reuses the data structures that the previous cython implementation introduced (mainly flattening all the terra objects into 1d or 2d numpy arrays for efficient access from C). The speedups from this PR can be significant, calling transpile() on a 400 qubit (with a depth of 10) QV model circuit targetting a 409 heavy hex coupling map goes from ~200 seconds with the single threaded cython to ~60 seconds with this PR locally on a 32 core system, When transpiling a 1000 qubit (also with a depth of 10) QV model circuit targetting a 1081 qubit heavy hex coupling map goes from taking ~6500 seconds to ~720 seconds. The tradeoff with this PR is for local qiskit-terra development a rust compiler needs to be installed. This is made trivial using rustup (https://rustup.rs/), but it is an additional burden and one that we might not want to make. If so we can look at turning this PR into a separate repository/package that qiskit-terra can depend on. The tradeoff here is that we'll be adding friction to the api boundary between the pass and the core swap trials interface. But, it does ease the dependency on development for qiskit-terra. * Sanitize packaging to support future modules This commit fixes how we package the compiled rust module in qiskit-terra. As a single rust project only gives us a single compiled binary output we can't use the same scheme we did previously with cython with a separate dynamic lib file for each module. This shifts us to making the rust code build a `qiskit._accelerate` module and in that we have submodules for everything we need from compiled code. For this PR there is only one submodule, `stochastic_swap`, so for example the parallel swap_trials routine can be imported from `qiskit._accelerate.stochastic_swap.swap_trials`. In the future we can have additional submodules for other pieces of compiled code in qiskit. For example, the likely next candidate is the pauli expectation value cython module, which we'll likely port to rust and also make parallel (for sufficiently large number of qubits). In that case we'd add a new submodule for that functionality. * Adjust random normal distribution to use correct mean This commit corrects the use of the normal distribution to have the mean set to 1.0. Previously we were doing this out of band for each value by adding 1 to the random value which wasn't necessary because we could just generate it with a mean of 1.0. * Remove unecessary extra scope from locked read This commit removes an unecessary extra scope around the locked read for where we store the best solution. The scope was previously there to release the lock after we check if there is a solution or not. However this wasn't actually needed as we can just do the check inline and the lock will release after the condition block. * Remove unecessary explicit type from opt_edges variable * Fix indices typo in NLayout constructor Co-authored-by: Jake Lishman <jake@binhbar.com> * Remove explicit lifetime annotation from swap_trials Previously the swap_trials() function had an explicit lifetime annotation `'p` which wasn't necessary because the compiler can determine this on it's own. Normally when dealing with numpy views and a Python object (i.e. a GIL handle) we need a lifetime annotation to tell the rust compiler the numpy view and the python gil handle will have the same lifetime. But since swap_trials doesn't take a gil handle and operates purely in rust we don't need this lifetime and the rust compiler can deal with the lifetime of the numpy views on their own. * Use sum() instead of fold() * Fix lint and add rust style and lint checks to CI This commit fixes the python lint failures and also updates the ci configuration for the lint job to also run rust's style and lint enforcement. * Fix returned layout mapping from NLayout This commit fixes the output list from the `layout_mapping()` method of `NLayout`. Previously, it incorrectly would return the wrong indices it should be a list of virtual -> physical to qubit pairs. This commit corrects this error Co-authored-by: georgios-ts <45130028+georgios-ts@users.noreply.github.com> * Tweak tox configuration to try and reliably build rust extension * Make swap_trials parallelization configurable This commit makes the parallelization of the swap_trials() configurable. This is dones in two ways, first a new argument parallel_threshold is added which takes an optional int which is the number of qubits to switch between a parallel and serial version. The second is that it takes into account the the state of the QISKIT_IN_PARALLEL environment variable. This variable is set to TRUE by parallel_map() when we're running in a multiprocessing context. In those cases also running stochastic swap in parallel will likely just cause too much load as we're potentially oversubscribing work to the number of available CPUs. So, if QISKIT_IN_PARALLEL is set to True we run swap_trials serially. * Revert "Make swap_trials parallelization configurable" This reverts commit 57790c84b03da10fd7296c57b38b54c5bccebf4c. That commit attempted to sovle some issues in test running, mainly around multiple parallel dispatch causing exceess load. But in practice it was broken and caused more issues than it fixed. We'll investigate and add control for the parallelization in a future commit separately after all the tests are passing so we have a good baseline. * Add docs to swap_trials() and remove unecessary num_gates arg * Fix race condition leading to non-deterministic behavior Previously, in the case of circuits that had multiple best possible depth == 1 solutions for a layer, there was a race condition in the fast exit path between the threads which could lead to a non-deterministic result even with a fixed seed. The output was always valid, but which result was dependent on which parallel thread with an ideal solution finished last and wrote to the locked best result last. This was causing weird non-deterministic test failures for some tests because of #1794 as the exact match result would change between runs. This could be a bigger issue because user expectations are that with a fixed seed set on the transpiler that the output circuit will be deterministically reproducible. To address this is issue this commit trades off some performance to ensure we're always returning a deterministic result in this case. This is accomplished by updating/checking if a depth==1 solution has been found in another trial thread we only act (so either exit early or update the already found depth == 1 solution) if that solution already found has a trial number that is less than this thread's trial number. This does limit the effectiveness of the fast exit, but in practice it should hopefully not effect the speed too much. As part of this commit some tests are updated because the new deterministic behavior is slightly different from the previous results from the cython serial implementation. I manually verified that the new output circuits are still valid (it also looks like the quality of the results in some of those cases improved, but this is strictly anecdotal and shouldn't be taken as a general trend with this PR). * Apply suggestions from code review Co-authored-by: georgios-ts <45130028+georgios-ts@users.noreply.github.com> * Fix compiler errors in previous commit * Revert accidental commit of parallel reduction in compute_cost This was only a for local testing to prove it was a bad idea and was accidently included in the branch. We should not nest the parallel execution like this. * Eliminate short circuit for depth == 1 swap_trial() result This commit eliminates the short circuit fast return in swap_trial() when another trial thread has found an ideal solution. Trying to do this in a parallel context is tricky to make deterministic because in cases of >1 depth == 1 solutions there is an inherent race condition between the threads for writing out their depth == 1 result to the shared location. Different strategies were tried to make this reliably deterministic but there wa still a race condition. Since this was just a performance optimization to avoid doing unnecessary work this commit removes this step. Weighing improved performance against repeatability in the output of the compiler, the reproducible results are more important. After we've adopted a multithreaded stochastic swap we can investigate adding this back as a potential future optimization. * Add missing docstrings * Add section to contributing on installing form source * Make rust python classes pickleable * Add rust compiler install to linux wheel jobs * Try more tox changes to fix docs builds * Revert "Eliminate short circuit for depth == 1 swap_trial() result" This reverts commit c510764a770cb610661bdb3732337cd45ab587fd. The removal there was premature and we had a fix for the non-determinism in place, ignoring a typo which was preventing it from working. Co-Authored-By: Georgios Tsilimigkounakis <45130028+georgios-ts@users.noreply.github.com> * Fix submodule declaration and module attribute on rust classes * Fix rust lint * Fix docs job definition * Disable multiprocessing parallelism in unit tests This commit disables the multiprocessing based parallelism when running unittest jobs in CI. We historically have defaulted the use of multiprocessing in environments only where the "fork" start method is available because this has the best performance and has no caveats around how it is used by users (you don't need an `if __name__ == "__main__"` guard). However, the use of the "fork" method isn't always 100% reliable (see https://bugs.python.org/issue40379), which we saw on Python 3.9 #6188. In unittest CI (and tox) by default we use stestr which spawns (not using fork) parallel workers to run tests in parallel. With this PR this means in unittest we're now running multiple test runner subprocesses, which are executing parallel dispatched code using multiprocessing's fork start method, which is executing multithreaded rust code. This three layers of nesting is fairly reliably hanging as Python's fork doesn't seem to be able to handle this many layers of nested parallelism. There are 2 ways I've been able to fix this, the first is to change the start method used by `parallel_map()` to either "spawn" or "forkserver" either of these does not suffer from random hanging. However, doing this in the unittest context causes significant overhead and slows down test executing significantly. The other is to just disable the multiprocessing which fixes the hanging and doesn't impact runtime performance signifcantly (and might actually help in CI so we're not oversubscribing the limited resources. As I have not been able to reproduce `parallel_map()` hanging in a standalone context with multithreaded stochastic swap this commit opts for just disabling multiprocessing in CI and documenting the known issue in the release notes as this is the simpler solution. It's unlikely that users will nest parallel processes as it typically hurts performance (and parallel_map() actively guards against it), we only did it in testing previously because the tests which relied on it were a small portion of the test suite (roughly 65 tests) and typically did not have a significant impact on the total throughput of the test suite. * Fix typo in azure pipelines config * Remove unecessary extension compilation for image tests * Add test script to explicitly verify parallel dispatch In an earlier commit we disabled the use of parallel dispatch in parallel_map() to avoid a bug in cpython associated with their fork() based subprocess launch. Doing this works around the bug which was reliably triggered by running multiprocessing in parallel subprocesses. It also has the side benefit of providing a ~2x speed up for test suite execution in CI. However, this meant we lost our test coverage in CI for running parallel_map() with actual multiprocessing based parallel dispatch. To ensure we don't inadvertandtly regress this code path moving forward this commit adds a dedicated test script which runs a simple transpilation in parallel and verifies that everything works as expected with the default parallelism settings. * Avoid multi-threading when run in a multiprocessing context This commit adds a switch on running between a single threaded and a multithreaded variant of the swap_trials loop based on whether the QISKIT_IN_PARALLEL flag is set. If QISKIT_IN_PARALLEL is set to TRUE this means the `parallel_map()` function is running in the outer python context and we're running in multiprocessing already. This means we do not want to be running in multiple threads generally as that will lead to potential resource exhaustion by spawn n processes each potentially running with m threads where `n` is `min(num_phys_cpus, num_tasks)` and `m` is num_logical_cpus (although only `min(num_logical_cpus, num_trials)` will be active) which on the typical system there aren't enough cores to leverage both multiprocessing and multithreading. However, in case a user does have such an environment they can set the `QISKIT_FORCE_THREADS` env variable to `TRUE` which will use threading regardless of the status of `QISKIT_IN_PARALLEL`. * Apply suggestions from code review Co-authored-by: Jake Lishman <jake@binhbar.com> * Minor fixes from review comments This commits fixes some minor details found during code review. It expands the section on building from source to explain how to build a release optimized binary with editable mode, makes the QISKIT_PARALLEL env variable usage consistent across all jobs, and adds a missing shebang to the `install_rush.sh` script which is used to install rust in the manylinux container environment. * Simplify tox configuration In earlier commits the tox configuration was changed to try and fix the docs CI job by going to great effort to try and enforce that setuptools-rust was installed in all situations, even before it was actually needed. However, the problem with the docs ci job was unrelated to the tox configuration and this reverts the configuration to something that works with more versions of tox and setuptools-rust. * Add missing pieces of cargo configuration Co-authored-by: Jake Lishman <jake@binhbar.com> Co-authored-by: georgios-ts <45130028+georgios-ts@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-03-01 05:49:54 +08:00
Add Rust-based OpenQASM 2 converter (#9784) * Add Rust-based OpenQASM 2 converter This is a vendored version of qiskit-qasm2 (https://pypi.org/project/qiskit-qasm2), with this initial commit being equivalent (barring some naming / documentation / testing conversions to match Qiskit's style) to version 0.5.3 of that package. This adds a new translation layer from OpenQASM 2 to Qiskit, which is around an order of magnitude faster than the existing version in Python, while being more type safe (in terms of disallowing invalid OpenQASM 2 programs rather than attempting to construction `QuantumCircuit`s that are not correct) and more extensible. The core logic is a hand-written lexer and parser combination written in Rust, which emits a bytecode stream across the PyO3 boundary to a small Python interpreter loop. The main bulk of the parsing logic is a simple LL(1) recursive-descent algorithm, which delegates to more specific recursive Pratt-based algorithm for handling classical expressions. Many of the design decisions made (including why the lexer is written by hand) are because the project originally started life as a way for me to learn about implementations of the different parts of a parser stack; this is the principal reason there are very few external crates used. There are a few inefficiencies in this implementation, for example: - the string interner in the lexer allocates twice for each stored string (but zero times for a lookup). It may be possible to completely eliminate allocations when parsing a string (or a file if it's read into memory as a whole), but realistically there's only a fairly small number of different tokens seen in most OpenQASM 2 programs, so it shouldn't be too big a deal. - the hand-off from Rust to Python transfers small objects frequently. It might be more efficient to have a secondary buffered iterator in Python space, transferring more bytecode instructions at a time and letting Python resolve them. This form could also be made asynchronous, since for the most part, the Rust components only need to acquire the CPython GIL at the API boundary. - there are too many points within the lexer that can return a failure result that needs unwrapping at every site. Since there are no tokens that can span multiple lines, it should be possible to refactor so that almost all of the byte-getter and -peeker routines cannot return error statuses, at the cost of the main lexer loop becoming responsible for advancing the line buffer, and moving the non-ASCII error handling into each token constructor. I'll probably keep playing with some of those in the `qiskit-qasm2` package itself when I have free time, but at some point I needed to draw the line and vendor the package. It's still ~10x faster than the existing one: In [1]: import qiskit.qasm2 ...: prog = """ ...: OPENQASM 2.0; ...: include "qelib1.inc"; ...: qreg q[2]; ...: """ ...: prog += "rz(pi * 2) q[0];\ncx q[0], q[1];\n"*100_000 ...: %timeit qiskit.qasm2.loads(prog) ...: %timeit qiskit.QuantumCircuit.from_qasm_str(prog) 2.26 s ± 39.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 22.5 s ± 106 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) `cx`-heavy programs like this one are actually the ones that the new parser is (comparatively) slowest on, because the construction time of `CXGate` is higher than most gates, and this dominates the execution time for the Rust-based parser. * Work around docs failure on Sphinx 5.3, Python 3.9 The version of Sphinx that we're constrained to use in the docs build can't handle the `Unpack` operator, so as a temporary measure we can just relax the type hint a little. * Remove unused import * Tweak documentation * More specific PyO3 usage * Use PathBuf directly for paths * Format * Freeze dataclass * Use type-safe id types This should have no impact on runtime or on memory usage, since each of the new types has the same bit width and alignment as the `usize` values they replace. * Documentation tweaks * Fix comments in lexer * Fix lexing version number with separating comments * Add test of pathological formatting * Fixup release note * Fix handling of u0 gate * Credit reviewers Co-authored-by: Luciano Bello <bel@zurich.ibm.com> Co-authored-by: Kevin Hartman <kevin@hart.mn> Co-authored-by: Eric Arellano <14852634+Eric-Arellano@users.noreply.github.com> * Add test of invalid gate-body statements * Refactor custom built-in gate definitions The previous system was quite confusing, and required all accesses to the global symbol table to know that the `Gate` symbol could be present but overridable. This led to confusing logic, various bugs and unnecessary constraints, such as it previously being (erroneously) possible to provide re-definitions for any "built-in" gate. Instead, we keep a separate store of instructions that may be redefined. This allows the logic to be centralised to only to the place responsible for performing those overrides, and remains accessible for error-message builders to query in order to provide better diagnostics. * Credit Sasha Co-authored-by: Alexander Ivrii <alexi@il.ibm.com> * Credit Matthew Co-authored-by: Matthew Treinish <mtreinish@kortar.org> * Remove dependency on `lazy_static` For a hashset of only 6 elements that is only checked once, there's not really any point to pull in an extra dependency or use a hash set at all. * Update PyO3 version --------- Co-authored-by: Luciano Bello <bel@zurich.ibm.com> Co-authored-by: Kevin Hartman <kevin@hart.mn> Co-authored-by: Eric Arellano <14852634+Eric-Arellano@users.noreply.github.com> Co-authored-by: Alexander Ivrii <alexi@il.ibm.com> Co-authored-by: Matthew Treinish <mtreinish@kortar.org>
2023-04-13 00:00:54 +08:00
# If RUST_DEBUG is set, force compiling in debug mode. Else, use the default behavior of whether
# it's an editable installation.
rust_debug = True if os.getenv("RUST_DEBUG") == "1" else None
Add infrastructure for gates, instruction, and operations in Rust (#12459) * Add infrastructure for gates, instruction, and operations in Rust This commit adds a native representation of Gates, Instruction, and Operations to rust's circuit module. At a high level this works by either wrapping the Python object in a rust wrapper struct that tracks metadata about the operations (name, num_qubits, etc) and then for other details it calls back to Python to get dynamic details like the definition, matrix, etc. For standard library gates like Swap, CX, H, etc this replaces the on-circuit representation with a new rust enum StandardGate. The enum representation is much more efficient and has a minimal memory footprint (just the enum variant and then any parameters or other mutable state stored in the circuit instruction). All the gate properties such as the matrix, definiton, name, etc are statically defined in rust code based on the enum variant (which represents the gate). The use of an enum to represent standard gates does mean a change in what we store on a CircuitInstruction. To represent a standard gate fully we need to store the mutable properties of the existing Gate class on the circuit instruction as the gate by itself doesn't contain this detail. That means, the parameters, label, unit, duration, and condition are added to the rust side of circuit instrucion. However no Python side access methods are added for these as they're internal only to the Rust code. In Qiskit 2.0 to simplify this storage we'll be able to drop, unit, duration, and condition from the api leaving only label and parameters. But for right now we're tracking all of the fields. To facilitate working with circuits and gates full from rust the setting the `operation` attribute of a `CircuitInstruction` object now transltates the python object to an internal rust representation. For standard gates this translates it to the enum form described earlier, and for other circuit operations 3 new Rust structs: PyGate, PyInstruction, and PyOperation are used to wrap the underlying Python object in a Rust api. These structs cache some commonly accessed static properties of the operation, such as the name, number of qubits, etc. However for dynamic pieces, such as the definition or matrix, callback to python to get a rust representation for those. Similarly whenever the `operation` attribute is accessed from Python it converts it back to the normal Python object representation. For standard gates this involves creating a new instance of a Python object based on it's internal rust representation. For the wrapper structs a reference to the wrapped PyObject is returned. To manage the 4 variants of operation (`StandardGate`, `PyGate`, `PyInstruction`, and `PyOperation`) a new Rust trait `Operation` is created that defines a standard interface for getting the properties of a given circuit operation. This common interface is implemented for the 4 variants as well as the `OperationType` enum which wraps all 4 (and is used as the type for `CircuitInstruction.operation` in the rust code. As everything in the `QuantumCircuit` data model is quite coupled moving the source of truth for the operations to exist in Rust means that more of the underlying `QuantumCircuit`'s responsibility has to move to Rust as well. Primarily this involves the `ParameterTable` which was an internal class for tracking which instructions in the circuit have a `ParameterExpression` parameter so that when we go to bind parameters we can lookup which operations need to be updated with the bind value. Since the representation of those instructions now lives in Rust and Python only recieves a ephemeral copy of the instructions the ParameterTable had to be reimplemented in Rust to track the instructions. This new parameter table maps the Parameter's uuid (as a u128) as a unique identifier for each parameter and maps this to a positional index in the circuit data to the underlying instruction using that parameter. This is a bit different from the Python parameter table which was mapping a parameter object to the id of the operation object using that parmaeter. This also leads to a difference in the binding mechanics as the parameter assignment was done by reference in the old model, but now we need to update the entire instruction more explicitly in rust. Additionally, because the global phase of a circuit can be parameterized the ownership of global phase is moved from Python into Rust in this commit as well. After this commit the only properties of a circuit that are not defined in Rust for the source of truth are the bits (and vars) of the circuit, and when creating circuits from rust this is what causes a Python interaction to still be required. This commit does not translate the full standard library of gates as that would make the pull request huge, instead this adds the basic infrastructure for having a more efficient standard gate representation on circuits. There will be follow up pull requests to add the missing gates and round out support in rust. The goal of this pull request is primarily to add the infrastructure for representing the full circuit model (and dag model in the future) in rust. By itself this is not expected to improve runtime performance (if anything it will probably hurt performance because of extra type conversions) but it is intended to enable writing native circuit manipulations in Rust, including transpiler passes without needing involvement from Python. Longer term this should greatly improve the runtime performance and reduce the memory overhead of Qiskit. But, this is just an early step towards that goal, and is more about unlocking the future capability. The next steps after this commit are to finish migrating the standard gate library and also update the `QuantumCircuit` methods to better leverage the more complete rust representation (which should help offset the performance penalty introduced by this). Fixes: #12205 * Fix Python->Rust Param conversion This commit adds a custom implementation of the FromPyObject trait for the Param enum. Previously, the Param trait derived it's impl of the trait, but this logic wasn't perfect. In cases whern a ParameterExpression was effectively a constant (such as `0 * x`) the trait's attempt to coerce to a float first would result in those ParameterExpressions being dropped from the circuit at insertion time. This was a change in behavior from before having gates in Rust as the parameters would disappear from the circuit at insertion time instead of at bind time. This commit fixes this by having a custom impl for FromPyObject that first tries to figure out if the parameter is a ParameterExpression (or a QuantumCircuit) by using a Python isinstance() check, then tries to extract it as a float, and finally stores a non-parameter object; which is a new variant in the Param enum. This new variant also lets us simplify the logic around adding gates to the parameter table as we're able to know ahead of time which gate parameters are `ParameterExpression`s and which are other objects (and don't need to be tracked in the parameter table. Additionally this commit tweaks two tests, the first is test.python.circuit.library.test_nlocal.TestNLocal.test_parameters_setter which was adjusted in the previous commit to workaround the bug fixed by this commit. The second is test.python.circuit.test_parameters which was testing that a bound ParameterExpression with a value of 0 defaults to an int which was a side effect of passing an int input to symengine for the bind value and not part of the api and didn't need to be checked. This assertion was removed from the test because the rust representation is only storing f64 values for the numeric parameters and it is never an int after binding from the Python perspective it isn't any different to have float(0) and int(0) unless you explicit isinstance check like the test previously was. * Fix qasm3 exporter for std gates without stdgates.inc This commit fixes the handling of standard gates in Qiskit when the user specifies excluding the use of the stdgates.inc file from the exported qasm. Previously the object id of the standard gates were used to maintain a lookup table of the global definitions for all the standard gates explicitly in the file. However, the rust refactor means that every time the exporter accesses `circuit.data[x].operation` a new instance is returned. This means that on subsequent lookups for the definition the gate definitions are never found. To correct this issue this commit adds to the lookup table a fallback of the gate name + parameters to do the lookup for. This should be unique for any standard gate and not interfere with the previous logic that's still in place and functional for other custom gate definitions. While this fixes the logic in the exporter the test is still failing because the test is asserting the object ids are the same in the qasm3 file, which isn't the case anymore. The test will be updated in a subsequent commit to validate the qasm3 file is correct without using a hardcoded object id. * Fix base scheduler analysis pass duration setting When ALAPScheduleAnalysis and ASAPScheduleAnalysis were setting the duration of a gate they were doing `node.op.duration = duration` this wasn't always working because if `node.op` was a standard gate it returned a new Python object created from the underlying rust representation. This commit fixes the passes so that they modify the duration and then explicit set the operation to update it's rust representation. * Fix python lint * Fix last failing qasm3 test for std gates without stdgates.inc While the logic for the qasm3 exporter was fixed in commit a6e69ba4c99cdd2fa50ed10399cada322bc88903 to handle the edge case of a user specifying that the qasm exporter does not use the stdgates.inc include file in the output, but also has qiskit's standard gates in their circuit being exported. The one unit test to provide coverage for that scenario was not passing because when an id was used for the gate definitions in the qasm3 file it was being referenced against a temporary created by accessing a standard gate from the circuit and the ids weren't the same so the reference string didn't match what the exporter generated. This commit fixes this by changing the test to not do an exact string comparison, but instead a line by line comparison that either does exact equality check or a regex search for the expected line and the ids are checked as being any 15 character integer. * Remove superfluous comment * Cache imported classes with GILOnceCell * Remove unused python variables * Add missing file * Update QuantumCircuit gate methods to bypass Python object This commit updates the QuantumCircuit gate methods which add a given gate to the circuit to bypass the python gate object creation and directly insert a rust representation of the gate. This avoids a conversion in the rust side of the code. While in practice this is just the Python side object creation and a getattr for the rust code to determine it's a standard gate that we're skipping. This may add up over time if there are a lot of gates being created by the method. To accomplish this the rust code handling the mapping of rust StandardGate variants to the Python classes that represent those gates needed to be updated as well. By bypassing the python object creation we need a fallback to populate the gate class for when a user access the operation object from Python. Previously this mapping was only being populated at insertion time and if we never insert the python object (for a circuit created only via the methods) then we need a way to find what the gate class is. A static lookup table of import paths and class names are added to `qiskit_circuit::imports` module to faciliate this and helper functions are added to facilitate interacting with the class objects that represent each gate. * Deduplicate gate matrix definitions * Fix lint * Attempt to fix qasm3 test failure * Add compile time option to cache py gate returns for rust std gates This commit adds a new rust crate feature flag for the qiskit-circuits and qiskit-pyext that enables caching the output from CircuitInstruction.operation to python space. Previously, for memory efficiency we were reconstructing the python object on demand for every access. This was to avoid carrying around an extra pointer and keeping the ephemeral python object around longer term if it's only needed once. But right now nothing is directly using the rust representation yet and everything is accessing via the python interface, so recreating gate objects on the fly has a huge performance penalty. To avoid that this adds caching by default as a temporary solution to avoid this until we have more usage of the rust representation of gates. There is an inherent tension between an optimal rust representation and something that is performant for Python access and there isn't a clear cut answer on which one is better to optimize for. A build time feature lets the user pick, if what we settle on for the default doesn't agree with their priorities or use case. Personally I'd like to see us disable the caching longer term (hopefully before releasing this functionality), but that's dependent on a sufficent level of usage from rust superseding the current Python space usage in the core of Qiskit. * Add num_nonlocal_gates implementation in rust This commit adds a native rust implementation to rust for the num_nonlocal_gates method on QuantumCircuit. Now that we have a rust representation of gates it is potentially faster to do the count because the iteration and filtering is done rust side. * Performance tuning circuit construction This commit fixes some performance issues with the addition of standard gates to a circuit. To workaround potential reference cycles in Python when calling rust we need to check the parameters of the operation. This was causing our fast path for standard gates to access the `operation` attribute to get the parameters. This causes the gate to be eagerly constructed on the getter. However, the reference cycle case can only happen in situations without a standard gate, and the fast path for adding standard gates directly won't need to run this so a skip is added if we're adding a standard gate. * Add back validation of parameters on gate methods In the previous commit a side effect of the accidental eager operation creation was that the parameter input for gates were being validated by that. By fixing that in the previous commit the validation of input parameters on the circuit methods was broken. This commit fixes that oversight and adds back the validation. * Skip validation on gate creation from rust * Offload operation copying to rust This commit fixes a performance regression in the `QuantumCircuit.copy()` method which was previously using Python to copy the operations which had extra overhead to go from rust to python and vice versa. This moves that logic to exist in rust and improve the copy performance. * Fix lint * Perform deepcopy in rust This commit moves the deepcopy handling to occur solely in Rust. Previously each instruction would be directly deepcopied by iterating over the circuit data. However, we can do this rust side now and doing this is more efficient because while we need to rely on Python to run a deepcopy we can skip it for the Rust standard gates and rely on Rust to copy those gates. * Fix QuantumCircuit.compose() performance regression This commit fixes a performance regression in the compose() method. This was caused by the checking for classical conditions in the method requiring eagerly converting all standard gates to a Python object. This changes the logic to do this only if we know we have a condition (which we can determine Python side now). * Fix map_ops test case with no caching case * Fix typos in docs This commit fixes several docs typos that were caught during code review. Co-authored-by: Eli Arbel <46826214+eliarbel@users.noreply.github.com> * Shrink memory usage for extra mutable instruction state This commit changes how we store the extra mutable instruction state (condition, duration, unit, and label) for each `CircuitInstruction` and `PackedInstruction` in the circuit. Previously it was all stored as separate `Option<T>` fields on the struct, which required at least a pointer's width for each field which was wasted space the majority of the time as using these fields are not common. To optimize the memory layout of the struct this moves these attributes to a new struct which is put in an `Option<Box<_>>` which reduces it from 4 pointer widths down to 1 per object. This comes from extra runtime cost from the extra layer of pointer indirection but as this is the uncommon path this tradeoff is fine. * Remove Option<> from params field in CircuitInstruction This commit removes the Option<> from the params field in CircuitInstruction. There is no real distinction between an empty vec and None in this case, so the option just added another layer in the API that we didn't need to deal with. Also depending on the memory alignment using an Option<T> might have ended up in a little extra memory usage too, so removing it removes that potential source of overhead. * Eagerly construct rust python wrappers in .append() This commit updates the Python code in QuantumCircuit.append() method to eagerly construct the rust wrapper objects for python defined circuit operations. * Simplify code around handling python errors in rust * Revert "Skip validation on gate creation from rust" This reverts commit 2f81bde8bf32b06b4165048896eabdb36470814d. The validation skipping was unsound in some cases and could lead to invalid circuit being generated. If we end up needing this as an optimization we can remove this in the future in a follow-up PR that explores this in isolation. * Temporarily use git for qasm3 import In Qiskit/qiskit-qasm3-import#34 the issue we're hitting caused by qiskit-qasm3-import using the private circuit attributes removed in this PR was fixed. This commit temporarily moves to installing it from git so we can fully run CI. When qiskit-qasm3-import is released we should revert this commit. * Fix lint * Fix lint for real (we really need to use a py312 compatible version of pylint) * Fix test failure caused by incorrect lint fix * Relax trait-method typing requirements * Encapsulate `GILOnceCell` initialisers to local logic * Simplify Interface for building circuit of standard gates in rust * Simplify complex64 creation in gate_matrix.rs This just switches Complex64::new(re, im) to be c64(re, im) to reduce the amount of typing. c64 needs to be defined inplace so it can be a const fn. * Simplify initialization of array of elements that are not Copy (#28) * Simplify initialization of array of elements that are not Copy * Only generate array when necessary * Fix doc typos Co-authored-by: Kevin Hartman <kevin@hart.mn> * Add conversion trait for OperationType -> OperationInput and simplify CircuitInstruction::replace() * Use destructuring for operation_type_to_py extra attr handling * Simplify trait bounds for map_indices() The map_indices() method previously specified both Iterator and ExactSizeIterator for it's trait bounds, but Iterator is a supertrait of ExactSizeIterator and we don't need to explicitly list both. This commit removes the duplicate trait bound. * Make Qubit and Clbit newtype member public As we start to use Qubit and Clbit for creating circuits from accelerate and other crates in the Qiskit workspace we need to be able to create instances of them. However, the newtype member BitType was not public which prevented creating new Qubits. This commit fixes this by making it public. * Use snakecase for gate matrix names * Remove pointless underscore prefix * Use downcast instead of bound * Rwork _append reference cycle handling This commit reworks the multiple borrow handling in the _append() method to leveraging `Bound.try_borrow()` to return a consistent error message if we're unable to borrow a CircuitInstruction in the rust code meaning there is a cyclical reference in the code. Previously we tried to detect this cycle up-front which added significant overhead for a corner case. * Make CircuitData.global_phase_param_index a class attr * Use &[Param] instead of &SmallVec<..> for operation_type_and_data_to_py * Have get_params_unsorted return a set * Use lookup table for static property methods of StandardGate * Use PyTuple::empty_bound() * Fix lint * Add missing test method docstring * Reuse allocations in parameter table update * Remove unnecessary global phase zeroing * Move manually set params to a separate function * Fix release note typo * Use constant for global-phase index * Switch requirement to release version --------- Co-authored-by: Eli Arbel <46826214+eliarbel@users.noreply.github.com> Co-authored-by: Jake Lishman <jake.lishman@ibm.com> Co-authored-by: John Lapeyre <jlapeyre@users.noreply.github.com> Co-authored-by: Kevin Hartman <kevin@hart.mn>
2024-06-13 18:48:40 +08:00
# If QISKIT_NO_CACHE_GATES is set then don't enable any features while building
#
# TODO: before final release we should reverse this by default once the default transpiler pass
# is all in rust (default to no caching and make caching an opt-in feature). This is opt-out
# right now to avoid the runtime overhead until we are leveraging the rust gates infrastructure.
if os.getenv("QISKIT_NO_CACHE_GATES") == "1":
features = []
else:
features = ["cache_pygates"]
setup(
rust_extensions=[
RustExtension(
"qiskit._accelerate",
Refactor Rust crates to build a single extension module (#12134) * Refactor Rust crates to build a single extension module Previously we were building three different Python extension modules out of Rust space. This was convenient for logical separation of the code, and for making the incremental compilations of the smaller crates faster, but had the disadvantage that `qasm2` and `qasm3` couldn't access the new circuit data structures directly from Rust space and had to go via Python. This modifies the crate structure to put the Rust-space acceleration logic into crates that we only build as Rust libraries, then adds another (`pyext`) crate to be the only that that actually builds as shared Python C extension. This then lets the Rust crates interact with each other (and the Rust crates still contain PyO3 Python-binding logic internally) and accept and output each others' Python types. The one Python extension is still called `qiskit._accelerate`, but it's built by the Rust crate `qiskit-pyext`. This necessitated some further changes, since `qiskit._accelerate` now contains acccelerators for lots of parts of the Qiskit package hierarchy, but needs to have a single low-level place to initialise itself: - The circuit logic `circuit` is separated out into its own separate crate so that this can form the lowest part of the Rust hierarchy. This is done so that `accelerate` with its grab-bag of accelerators from all over the place does not need to be the lowest part of the stack. Over time, it's likely that everything will start touching `circuit` anyway. - `qiskit._qasm2` and `qiskit._qasm3` respectively became `qiskit._accelerate.qasm2` and `qiskit._accelerate.qasm3`, since they no longer get their own extension modules. - `qasm3` no longer stores a Python object on itself during module initialisation that depended on `qiskit.circuit.library` to create. Now, the Python-space `qiskit.qasm3` does that and the `qasm3` crate just retrieves that at Python runtime, since that crate requires Qiskit to be importable anyway. * Fix and document Rust test harness * Fix new clippy complaints These don't appear to be new from this patch series, but changing the access modifiers on some of the crates seems to make clippy complain more vociferously about some things. * Fix pylint import order * Remove erroneous comment * Fix typos Co-authored-by: Kevin Hartman <kevin@hart.mn> * Comment on `extension-module` * Unify naming of `qiskit-pyext` * Remove out-of-date advice on starting new crates --------- Co-authored-by: Kevin Hartman <kevin@hart.mn>
2024-04-18 03:52:07 +08:00
"crates/pyext/Cargo.toml",
binding=Binding.PyO3,
Add Rust-based OpenQASM 2 converter (#9784) * Add Rust-based OpenQASM 2 converter This is a vendored version of qiskit-qasm2 (https://pypi.org/project/qiskit-qasm2), with this initial commit being equivalent (barring some naming / documentation / testing conversions to match Qiskit's style) to version 0.5.3 of that package. This adds a new translation layer from OpenQASM 2 to Qiskit, which is around an order of magnitude faster than the existing version in Python, while being more type safe (in terms of disallowing invalid OpenQASM 2 programs rather than attempting to construction `QuantumCircuit`s that are not correct) and more extensible. The core logic is a hand-written lexer and parser combination written in Rust, which emits a bytecode stream across the PyO3 boundary to a small Python interpreter loop. The main bulk of the parsing logic is a simple LL(1) recursive-descent algorithm, which delegates to more specific recursive Pratt-based algorithm for handling classical expressions. Many of the design decisions made (including why the lexer is written by hand) are because the project originally started life as a way for me to learn about implementations of the different parts of a parser stack; this is the principal reason there are very few external crates used. There are a few inefficiencies in this implementation, for example: - the string interner in the lexer allocates twice for each stored string (but zero times for a lookup). It may be possible to completely eliminate allocations when parsing a string (or a file if it's read into memory as a whole), but realistically there's only a fairly small number of different tokens seen in most OpenQASM 2 programs, so it shouldn't be too big a deal. - the hand-off from Rust to Python transfers small objects frequently. It might be more efficient to have a secondary buffered iterator in Python space, transferring more bytecode instructions at a time and letting Python resolve them. This form could also be made asynchronous, since for the most part, the Rust components only need to acquire the CPython GIL at the API boundary. - there are too many points within the lexer that can return a failure result that needs unwrapping at every site. Since there are no tokens that can span multiple lines, it should be possible to refactor so that almost all of the byte-getter and -peeker routines cannot return error statuses, at the cost of the main lexer loop becoming responsible for advancing the line buffer, and moving the non-ASCII error handling into each token constructor. I'll probably keep playing with some of those in the `qiskit-qasm2` package itself when I have free time, but at some point I needed to draw the line and vendor the package. It's still ~10x faster than the existing one: In [1]: import qiskit.qasm2 ...: prog = """ ...: OPENQASM 2.0; ...: include "qelib1.inc"; ...: qreg q[2]; ...: """ ...: prog += "rz(pi * 2) q[0];\ncx q[0], q[1];\n"*100_000 ...: %timeit qiskit.qasm2.loads(prog) ...: %timeit qiskit.QuantumCircuit.from_qasm_str(prog) 2.26 s ± 39.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 22.5 s ± 106 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) `cx`-heavy programs like this one are actually the ones that the new parser is (comparatively) slowest on, because the construction time of `CXGate` is higher than most gates, and this dominates the execution time for the Rust-based parser. * Work around docs failure on Sphinx 5.3, Python 3.9 The version of Sphinx that we're constrained to use in the docs build can't handle the `Unpack` operator, so as a temporary measure we can just relax the type hint a little. * Remove unused import * Tweak documentation * More specific PyO3 usage * Use PathBuf directly for paths * Format * Freeze dataclass * Use type-safe id types This should have no impact on runtime or on memory usage, since each of the new types has the same bit width and alignment as the `usize` values they replace. * Documentation tweaks * Fix comments in lexer * Fix lexing version number with separating comments * Add test of pathological formatting * Fixup release note * Fix handling of u0 gate * Credit reviewers Co-authored-by: Luciano Bello <bel@zurich.ibm.com> Co-authored-by: Kevin Hartman <kevin@hart.mn> Co-authored-by: Eric Arellano <14852634+Eric-Arellano@users.noreply.github.com> * Add test of invalid gate-body statements * Refactor custom built-in gate definitions The previous system was quite confusing, and required all accesses to the global symbol table to know that the `Gate` symbol could be present but overridable. This led to confusing logic, various bugs and unnecessary constraints, such as it previously being (erroneously) possible to provide re-definitions for any "built-in" gate. Instead, we keep a separate store of instructions that may be redefined. This allows the logic to be centralised to only to the place responsible for performing those overrides, and remains accessible for error-message builders to query in order to provide better diagnostics. * Credit Sasha Co-authored-by: Alexander Ivrii <alexi@il.ibm.com> * Credit Matthew Co-authored-by: Matthew Treinish <mtreinish@kortar.org> * Remove dependency on `lazy_static` For a hashset of only 6 elements that is only checked once, there's not really any point to pull in an extra dependency or use a hash set at all. * Update PyO3 version --------- Co-authored-by: Luciano Bello <bel@zurich.ibm.com> Co-authored-by: Kevin Hartman <kevin@hart.mn> Co-authored-by: Eric Arellano <14852634+Eric-Arellano@users.noreply.github.com> Co-authored-by: Alexander Ivrii <alexi@il.ibm.com> Co-authored-by: Matthew Treinish <mtreinish@kortar.org>
2023-04-13 00:00:54 +08:00
debug=rust_debug,
Add infrastructure for gates, instruction, and operations in Rust (#12459) * Add infrastructure for gates, instruction, and operations in Rust This commit adds a native representation of Gates, Instruction, and Operations to rust's circuit module. At a high level this works by either wrapping the Python object in a rust wrapper struct that tracks metadata about the operations (name, num_qubits, etc) and then for other details it calls back to Python to get dynamic details like the definition, matrix, etc. For standard library gates like Swap, CX, H, etc this replaces the on-circuit representation with a new rust enum StandardGate. The enum representation is much more efficient and has a minimal memory footprint (just the enum variant and then any parameters or other mutable state stored in the circuit instruction). All the gate properties such as the matrix, definiton, name, etc are statically defined in rust code based on the enum variant (which represents the gate). The use of an enum to represent standard gates does mean a change in what we store on a CircuitInstruction. To represent a standard gate fully we need to store the mutable properties of the existing Gate class on the circuit instruction as the gate by itself doesn't contain this detail. That means, the parameters, label, unit, duration, and condition are added to the rust side of circuit instrucion. However no Python side access methods are added for these as they're internal only to the Rust code. In Qiskit 2.0 to simplify this storage we'll be able to drop, unit, duration, and condition from the api leaving only label and parameters. But for right now we're tracking all of the fields. To facilitate working with circuits and gates full from rust the setting the `operation` attribute of a `CircuitInstruction` object now transltates the python object to an internal rust representation. For standard gates this translates it to the enum form described earlier, and for other circuit operations 3 new Rust structs: PyGate, PyInstruction, and PyOperation are used to wrap the underlying Python object in a Rust api. These structs cache some commonly accessed static properties of the operation, such as the name, number of qubits, etc. However for dynamic pieces, such as the definition or matrix, callback to python to get a rust representation for those. Similarly whenever the `operation` attribute is accessed from Python it converts it back to the normal Python object representation. For standard gates this involves creating a new instance of a Python object based on it's internal rust representation. For the wrapper structs a reference to the wrapped PyObject is returned. To manage the 4 variants of operation (`StandardGate`, `PyGate`, `PyInstruction`, and `PyOperation`) a new Rust trait `Operation` is created that defines a standard interface for getting the properties of a given circuit operation. This common interface is implemented for the 4 variants as well as the `OperationType` enum which wraps all 4 (and is used as the type for `CircuitInstruction.operation` in the rust code. As everything in the `QuantumCircuit` data model is quite coupled moving the source of truth for the operations to exist in Rust means that more of the underlying `QuantumCircuit`'s responsibility has to move to Rust as well. Primarily this involves the `ParameterTable` which was an internal class for tracking which instructions in the circuit have a `ParameterExpression` parameter so that when we go to bind parameters we can lookup which operations need to be updated with the bind value. Since the representation of those instructions now lives in Rust and Python only recieves a ephemeral copy of the instructions the ParameterTable had to be reimplemented in Rust to track the instructions. This new parameter table maps the Parameter's uuid (as a u128) as a unique identifier for each parameter and maps this to a positional index in the circuit data to the underlying instruction using that parameter. This is a bit different from the Python parameter table which was mapping a parameter object to the id of the operation object using that parmaeter. This also leads to a difference in the binding mechanics as the parameter assignment was done by reference in the old model, but now we need to update the entire instruction more explicitly in rust. Additionally, because the global phase of a circuit can be parameterized the ownership of global phase is moved from Python into Rust in this commit as well. After this commit the only properties of a circuit that are not defined in Rust for the source of truth are the bits (and vars) of the circuit, and when creating circuits from rust this is what causes a Python interaction to still be required. This commit does not translate the full standard library of gates as that would make the pull request huge, instead this adds the basic infrastructure for having a more efficient standard gate representation on circuits. There will be follow up pull requests to add the missing gates and round out support in rust. The goal of this pull request is primarily to add the infrastructure for representing the full circuit model (and dag model in the future) in rust. By itself this is not expected to improve runtime performance (if anything it will probably hurt performance because of extra type conversions) but it is intended to enable writing native circuit manipulations in Rust, including transpiler passes without needing involvement from Python. Longer term this should greatly improve the runtime performance and reduce the memory overhead of Qiskit. But, this is just an early step towards that goal, and is more about unlocking the future capability. The next steps after this commit are to finish migrating the standard gate library and also update the `QuantumCircuit` methods to better leverage the more complete rust representation (which should help offset the performance penalty introduced by this). Fixes: #12205 * Fix Python->Rust Param conversion This commit adds a custom implementation of the FromPyObject trait for the Param enum. Previously, the Param trait derived it's impl of the trait, but this logic wasn't perfect. In cases whern a ParameterExpression was effectively a constant (such as `0 * x`) the trait's attempt to coerce to a float first would result in those ParameterExpressions being dropped from the circuit at insertion time. This was a change in behavior from before having gates in Rust as the parameters would disappear from the circuit at insertion time instead of at bind time. This commit fixes this by having a custom impl for FromPyObject that first tries to figure out if the parameter is a ParameterExpression (or a QuantumCircuit) by using a Python isinstance() check, then tries to extract it as a float, and finally stores a non-parameter object; which is a new variant in the Param enum. This new variant also lets us simplify the logic around adding gates to the parameter table as we're able to know ahead of time which gate parameters are `ParameterExpression`s and which are other objects (and don't need to be tracked in the parameter table. Additionally this commit tweaks two tests, the first is test.python.circuit.library.test_nlocal.TestNLocal.test_parameters_setter which was adjusted in the previous commit to workaround the bug fixed by this commit. The second is test.python.circuit.test_parameters which was testing that a bound ParameterExpression with a value of 0 defaults to an int which was a side effect of passing an int input to symengine for the bind value and not part of the api and didn't need to be checked. This assertion was removed from the test because the rust representation is only storing f64 values for the numeric parameters and it is never an int after binding from the Python perspective it isn't any different to have float(0) and int(0) unless you explicit isinstance check like the test previously was. * Fix qasm3 exporter for std gates without stdgates.inc This commit fixes the handling of standard gates in Qiskit when the user specifies excluding the use of the stdgates.inc file from the exported qasm. Previously the object id of the standard gates were used to maintain a lookup table of the global definitions for all the standard gates explicitly in the file. However, the rust refactor means that every time the exporter accesses `circuit.data[x].operation` a new instance is returned. This means that on subsequent lookups for the definition the gate definitions are never found. To correct this issue this commit adds to the lookup table a fallback of the gate name + parameters to do the lookup for. This should be unique for any standard gate and not interfere with the previous logic that's still in place and functional for other custom gate definitions. While this fixes the logic in the exporter the test is still failing because the test is asserting the object ids are the same in the qasm3 file, which isn't the case anymore. The test will be updated in a subsequent commit to validate the qasm3 file is correct without using a hardcoded object id. * Fix base scheduler analysis pass duration setting When ALAPScheduleAnalysis and ASAPScheduleAnalysis were setting the duration of a gate they were doing `node.op.duration = duration` this wasn't always working because if `node.op` was a standard gate it returned a new Python object created from the underlying rust representation. This commit fixes the passes so that they modify the duration and then explicit set the operation to update it's rust representation. * Fix python lint * Fix last failing qasm3 test for std gates without stdgates.inc While the logic for the qasm3 exporter was fixed in commit a6e69ba4c99cdd2fa50ed10399cada322bc88903 to handle the edge case of a user specifying that the qasm exporter does not use the stdgates.inc include file in the output, but also has qiskit's standard gates in their circuit being exported. The one unit test to provide coverage for that scenario was not passing because when an id was used for the gate definitions in the qasm3 file it was being referenced against a temporary created by accessing a standard gate from the circuit and the ids weren't the same so the reference string didn't match what the exporter generated. This commit fixes this by changing the test to not do an exact string comparison, but instead a line by line comparison that either does exact equality check or a regex search for the expected line and the ids are checked as being any 15 character integer. * Remove superfluous comment * Cache imported classes with GILOnceCell * Remove unused python variables * Add missing file * Update QuantumCircuit gate methods to bypass Python object This commit updates the QuantumCircuit gate methods which add a given gate to the circuit to bypass the python gate object creation and directly insert a rust representation of the gate. This avoids a conversion in the rust side of the code. While in practice this is just the Python side object creation and a getattr for the rust code to determine it's a standard gate that we're skipping. This may add up over time if there are a lot of gates being created by the method. To accomplish this the rust code handling the mapping of rust StandardGate variants to the Python classes that represent those gates needed to be updated as well. By bypassing the python object creation we need a fallback to populate the gate class for when a user access the operation object from Python. Previously this mapping was only being populated at insertion time and if we never insert the python object (for a circuit created only via the methods) then we need a way to find what the gate class is. A static lookup table of import paths and class names are added to `qiskit_circuit::imports` module to faciliate this and helper functions are added to facilitate interacting with the class objects that represent each gate. * Deduplicate gate matrix definitions * Fix lint * Attempt to fix qasm3 test failure * Add compile time option to cache py gate returns for rust std gates This commit adds a new rust crate feature flag for the qiskit-circuits and qiskit-pyext that enables caching the output from CircuitInstruction.operation to python space. Previously, for memory efficiency we were reconstructing the python object on demand for every access. This was to avoid carrying around an extra pointer and keeping the ephemeral python object around longer term if it's only needed once. But right now nothing is directly using the rust representation yet and everything is accessing via the python interface, so recreating gate objects on the fly has a huge performance penalty. To avoid that this adds caching by default as a temporary solution to avoid this until we have more usage of the rust representation of gates. There is an inherent tension between an optimal rust representation and something that is performant for Python access and there isn't a clear cut answer on which one is better to optimize for. A build time feature lets the user pick, if what we settle on for the default doesn't agree with their priorities or use case. Personally I'd like to see us disable the caching longer term (hopefully before releasing this functionality), but that's dependent on a sufficent level of usage from rust superseding the current Python space usage in the core of Qiskit. * Add num_nonlocal_gates implementation in rust This commit adds a native rust implementation to rust for the num_nonlocal_gates method on QuantumCircuit. Now that we have a rust representation of gates it is potentially faster to do the count because the iteration and filtering is done rust side. * Performance tuning circuit construction This commit fixes some performance issues with the addition of standard gates to a circuit. To workaround potential reference cycles in Python when calling rust we need to check the parameters of the operation. This was causing our fast path for standard gates to access the `operation` attribute to get the parameters. This causes the gate to be eagerly constructed on the getter. However, the reference cycle case can only happen in situations without a standard gate, and the fast path for adding standard gates directly won't need to run this so a skip is added if we're adding a standard gate. * Add back validation of parameters on gate methods In the previous commit a side effect of the accidental eager operation creation was that the parameter input for gates were being validated by that. By fixing that in the previous commit the validation of input parameters on the circuit methods was broken. This commit fixes that oversight and adds back the validation. * Skip validation on gate creation from rust * Offload operation copying to rust This commit fixes a performance regression in the `QuantumCircuit.copy()` method which was previously using Python to copy the operations which had extra overhead to go from rust to python and vice versa. This moves that logic to exist in rust and improve the copy performance. * Fix lint * Perform deepcopy in rust This commit moves the deepcopy handling to occur solely in Rust. Previously each instruction would be directly deepcopied by iterating over the circuit data. However, we can do this rust side now and doing this is more efficient because while we need to rely on Python to run a deepcopy we can skip it for the Rust standard gates and rely on Rust to copy those gates. * Fix QuantumCircuit.compose() performance regression This commit fixes a performance regression in the compose() method. This was caused by the checking for classical conditions in the method requiring eagerly converting all standard gates to a Python object. This changes the logic to do this only if we know we have a condition (which we can determine Python side now). * Fix map_ops test case with no caching case * Fix typos in docs This commit fixes several docs typos that were caught during code review. Co-authored-by: Eli Arbel <46826214+eliarbel@users.noreply.github.com> * Shrink memory usage for extra mutable instruction state This commit changes how we store the extra mutable instruction state (condition, duration, unit, and label) for each `CircuitInstruction` and `PackedInstruction` in the circuit. Previously it was all stored as separate `Option<T>` fields on the struct, which required at least a pointer's width for each field which was wasted space the majority of the time as using these fields are not common. To optimize the memory layout of the struct this moves these attributes to a new struct which is put in an `Option<Box<_>>` which reduces it from 4 pointer widths down to 1 per object. This comes from extra runtime cost from the extra layer of pointer indirection but as this is the uncommon path this tradeoff is fine. * Remove Option<> from params field in CircuitInstruction This commit removes the Option<> from the params field in CircuitInstruction. There is no real distinction between an empty vec and None in this case, so the option just added another layer in the API that we didn't need to deal with. Also depending on the memory alignment using an Option<T> might have ended up in a little extra memory usage too, so removing it removes that potential source of overhead. * Eagerly construct rust python wrappers in .append() This commit updates the Python code in QuantumCircuit.append() method to eagerly construct the rust wrapper objects for python defined circuit operations. * Simplify code around handling python errors in rust * Revert "Skip validation on gate creation from rust" This reverts commit 2f81bde8bf32b06b4165048896eabdb36470814d. The validation skipping was unsound in some cases and could lead to invalid circuit being generated. If we end up needing this as an optimization we can remove this in the future in a follow-up PR that explores this in isolation. * Temporarily use git for qasm3 import In Qiskit/qiskit-qasm3-import#34 the issue we're hitting caused by qiskit-qasm3-import using the private circuit attributes removed in this PR was fixed. This commit temporarily moves to installing it from git so we can fully run CI. When qiskit-qasm3-import is released we should revert this commit. * Fix lint * Fix lint for real (we really need to use a py312 compatible version of pylint) * Fix test failure caused by incorrect lint fix * Relax trait-method typing requirements * Encapsulate `GILOnceCell` initialisers to local logic * Simplify Interface for building circuit of standard gates in rust * Simplify complex64 creation in gate_matrix.rs This just switches Complex64::new(re, im) to be c64(re, im) to reduce the amount of typing. c64 needs to be defined inplace so it can be a const fn. * Simplify initialization of array of elements that are not Copy (#28) * Simplify initialization of array of elements that are not Copy * Only generate array when necessary * Fix doc typos Co-authored-by: Kevin Hartman <kevin@hart.mn> * Add conversion trait for OperationType -> OperationInput and simplify CircuitInstruction::replace() * Use destructuring for operation_type_to_py extra attr handling * Simplify trait bounds for map_indices() The map_indices() method previously specified both Iterator and ExactSizeIterator for it's trait bounds, but Iterator is a supertrait of ExactSizeIterator and we don't need to explicitly list both. This commit removes the duplicate trait bound. * Make Qubit and Clbit newtype member public As we start to use Qubit and Clbit for creating circuits from accelerate and other crates in the Qiskit workspace we need to be able to create instances of them. However, the newtype member BitType was not public which prevented creating new Qubits. This commit fixes this by making it public. * Use snakecase for gate matrix names * Remove pointless underscore prefix * Use downcast instead of bound * Rwork _append reference cycle handling This commit reworks the multiple borrow handling in the _append() method to leveraging `Bound.try_borrow()` to return a consistent error message if we're unable to borrow a CircuitInstruction in the rust code meaning there is a cyclical reference in the code. Previously we tried to detect this cycle up-front which added significant overhead for a corner case. * Make CircuitData.global_phase_param_index a class attr * Use &[Param] instead of &SmallVec<..> for operation_type_and_data_to_py * Have get_params_unsorted return a set * Use lookup table for static property methods of StandardGate * Use PyTuple::empty_bound() * Fix lint * Add missing test method docstring * Reuse allocations in parameter table update * Remove unnecessary global phase zeroing * Move manually set params to a separate function * Fix release note typo * Use constant for global-phase index * Switch requirement to release version --------- Co-authored-by: Eli Arbel <46826214+eliarbel@users.noreply.github.com> Co-authored-by: Jake Lishman <jake.lishman@ibm.com> Co-authored-by: John Lapeyre <jlapeyre@users.noreply.github.com> Co-authored-by: Kevin Hartman <kevin@hart.mn>
2024-06-13 18:48:40 +08:00
features=features,
Refactor Rust crates to build a single extension module (#12134) * Refactor Rust crates to build a single extension module Previously we were building three different Python extension modules out of Rust space. This was convenient for logical separation of the code, and for making the incremental compilations of the smaller crates faster, but had the disadvantage that `qasm2` and `qasm3` couldn't access the new circuit data structures directly from Rust space and had to go via Python. This modifies the crate structure to put the Rust-space acceleration logic into crates that we only build as Rust libraries, then adds another (`pyext`) crate to be the only that that actually builds as shared Python C extension. This then lets the Rust crates interact with each other (and the Rust crates still contain PyO3 Python-binding logic internally) and accept and output each others' Python types. The one Python extension is still called `qiskit._accelerate`, but it's built by the Rust crate `qiskit-pyext`. This necessitated some further changes, since `qiskit._accelerate` now contains acccelerators for lots of parts of the Qiskit package hierarchy, but needs to have a single low-level place to initialise itself: - The circuit logic `circuit` is separated out into its own separate crate so that this can form the lowest part of the Rust hierarchy. This is done so that `accelerate` with its grab-bag of accelerators from all over the place does not need to be the lowest part of the stack. Over time, it's likely that everything will start touching `circuit` anyway. - `qiskit._qasm2` and `qiskit._qasm3` respectively became `qiskit._accelerate.qasm2` and `qiskit._accelerate.qasm3`, since they no longer get their own extension modules. - `qasm3` no longer stores a Python object on itself during module initialisation that depended on `qiskit.circuit.library` to create. Now, the Python-space `qiskit.qasm3` does that and the `qasm3` crate just retrieves that at Python runtime, since that crate requires Qiskit to be importable anyway. * Fix and document Rust test harness * Fix new clippy complaints These don't appear to be new from this patch series, but changing the access modifiers on some of the crates seems to make clippy complain more vociferously about some things. * Fix pylint import order * Remove erroneous comment * Fix typos Co-authored-by: Kevin Hartman <kevin@hart.mn> * Comment on `extension-module` * Unify naming of `qiskit-pyext` * Remove out-of-date advice on starting new crates --------- Co-authored-by: Kevin Hartman <kevin@hart.mn>
2024-04-18 03:52:07 +08:00
)
],
Use stable Python C API for building Rust extension (#10120) * Use stable Python C API for building Rust extension This commit tweaks the rust extension code to start using the PyO3 abi3 flag to build binaries that are compatible with all python versions, not just a single release. Previously, we were building against the version specific C API and that resulted in needing abinary file for each supported python version on each supported platform/architecture. By using the abi3 feature flag and marking the wheels as being built with the limited api we can reduce our packaging overhead to just having one wheel file per supported platform/architecture. The only real code change needed here was to update the memory marginalization function. PyO3's abi3 feature is incompatible with returning a big int object from rust (the C API they use for that conversion isn't part of the stable C API). So this commit updates the function to convert to create a python int manually using the PyO3 api where that was being done before. Co-authored-by: Jake Lishman <jake.lishman@ibm.com> * Set minimum version on abi3 flag to Python 3.8 * Fix lint * Use py_limited_api="auto" on RustExtension According to the docs for the setuptools-rust RustExtension class: https://setuptools-rust.readthedocs.io/en/latest/reference.html#setuptools_rust.RustExtension The best setting to use for the py_limited_api argument is `"auto"` as this will use the setting in the PyO3 module to determine the correct value to set. This commit updates the setup.py to follow the recommendation in the docs. * Update handling of phase input to expval rust calls The pauli_expval module in Rust that Statevector and DensityMatrix leverage when computing defines the input type of the phase argument as Complex64. Previously, the quantum info code in the Statevector and DensityMatrix classes were passing in a 1 element ndarray for this parameter. When using the the version specific Python C API in pyo3 it would convert the single element array to a scalar value. However when using abi3 this handling was not done (or was not done correctly) and this caused the tests to fail. This commit updates the quantum info module to pass the phase as a complex value instead of a 1 element numpy array to bypass this behavior change in PyO3 when using abi3. Co-authored-by: Jake Lishman <jake.lishman@ibm.com> * Set py_limited_api explicitly to True * DNM: Test cibuildwheel works with abi3 * Add abi3audit to cibuildwheel repair step * Force setuptools to use abi3 tag * Add wheel to sdist build * Workaround abiaudit3 not moving wheels and windows not having a default repair command * Add source of setup.py hack * Add comment about pending pyo3 abi3 bigint support * Revert "DNM: Test cibuildwheel works with abi3" This reverts commit 8ca24cf1e4044e28dd96335d04b9b344d1019c60. * Add release note * Simplify setting abi3 tag in built wheels * Update releasenotes/notes/use-abi3-4a935e0557d3833b.yaml Co-authored-by: Jake Lishman <jake@binhbar.com> * Update release note * Update releasenotes/notes/use-abi3-4a935e0557d3833b.yaml --------- Co-authored-by: Jake Lishman <jake.lishman@ibm.com> Co-authored-by: Jake Lishman <jake@binhbar.com>
2023-06-12 21:45:27 +08:00
options={"bdist_wheel": {"py_limited_api": "cp38"}},
)