Multiprocessing in COSMIC¶
Both sampling and evolution in COSMIC can be parallelised across multiple CPU cores, which can
significantly reduce runtime when working with large populations. This guide explains how to use
multiprocessing for the independent sampler and for evolve().
There are two ways to enable multiprocessing in COSMIC:
Pass
nproc— an integer specifying how many worker processes to use. COSMIC will create a pool internally, use it, and then shut it down automatically.Pass an existing
pool— amultiprocessing.Pool(or any pool-like object with amapandstarmapinterface) that you have already created. COSMIC will use it but will not close it, so you remain in control of when to close it.
Note
Multiprocessing in Python requires that parallelised code runs inside a
if __name__ == "__main__": guard when using the default fork-free start methods
(spawn on Windows and macOS). All of the examples below define a main() function
and call it under this guard. If you are running interactively in a Jupyter notebook, you
may need to use a workaround such as multiprocessing.get_context("fork") or simply
set nproc=1.
How many cores should you use?¶
How many do you have?¶
Before deciding on nproc, it is worth knowing how many CPU cores are available on your
machine. You can check this with Python’s os or multiprocessing modules:
import os
import multiprocessing
print(os.cpu_count()) # logical cores (including hyperthreads)
print(multiprocessing.cpu_count()) # same value, via the multiprocessing module
The number reported includes hyperthreaded logical cores. For CPU-bound work like binary evolution the physical core count is often more relevant, though this varies by machine and workload.
More cores is not always faster¶
Spawning worker processes has a fixed overhead cost: Python needs to start each worker, import modules, and copy data to and from the workers. For small populations this overhead can easily exceed any time saved from parallelism, making multiprocessing slower than a single process.
As a rough guide you will see speedups from multiprocessing once you have at least a few thousand binaries, but the exact threshold depends on your machine and the details of your workload.
The best way to know whether more cores helps your specific workload is to benchmark it directly. Here is a simple timing script you can adapt:
In [19]: BSEDict = {
....: "pts1": 0.001, "pts2": 0.01, "pts3": 0.02, "zsun": 0.014, "windflag": 3,
....: "eddlimflag": 0, "neta": 0.5, "bwind": 0.0, "hewind": 0.5, "beta": 0.125,
....: "xi": 0.5, "acc2": 1.5, "LBV_flag": 1, "alpha1": 1.0, "lambdaf": 0.0,
....: "ceflag": 1, "cekickflag": 2, "cemergeflag": 1, "cehestarflag": 0,
....: "qcflag": 5,
....: "qcrit_array": [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0],
....: "kickflag": 5, "sigma": 265.0, "bhflag": 1, "bhsigmafrac": 1.0,
....: "sigmadiv": -20.0, "ecsn": 2.25, "ecsn_mlow": 1.6, "aic": 1, "ussn": 1,
....: "polar_kick_angle": 90.0,
....: "natal_kick_array": [[-100.0, -100.0, -100.0, -100.0, 0.0], [-100.0, -100.0, -100.0, -100.0, 0.0]],
....: "mm_mu_ns": 400.0, "mm_mu_bh": 200.0, "remnantflag": 4,
....: "fryer_mass_limit": 0, "mxns": 3.0, "rembar_massloss": 0.5,
....: "wd_mass_lim": 1, "maltsev_mode": 0, "maltsev_fallback": 0.5,
....: "maltsev_pf_prob": 0.1, "pisn": -2, "ppi_co_shift": 0.0,
....: "ppi_extra_ml": 0.0, "bhspinflag": 0, "bhspinmag": 0.0, "grflag": 1,
....: "eddfac": 10, "gamma": -2, "don_lim": -1, "acc_lim": -1, "tflag": 1,
....: "ST_tide": 1,
....: "fprimc_array": [2.0/21.0,2.0/21.0,2.0/21.0,2.0/21.0,2.0/21.0,2.0/21.0,2.0/21.0,2.0/21.0,2.0/21.0,2.0/21.0,2.0/21.0,2.0/21.0,2.0/21.0,2.0/21.0,2.0/21.0,2.0/21.0],
....: "ifflag": 1, "wdflag": 1, "epsnov": 0.001, "bdecayfac": 1,
....: "bconst": 3000, "ck": 1000, "rejuv_fac": 1.0, "rejuvflag": 0,
....: "bhms_coll_flag": 0, "htpmb": 1, "ST_cr": 1, "rtmsflag": 0
....: }
....:
import time
import multiprocessing
from cosmic.sample.initialbinarytable import InitialBinaryTable
from cosmic.evolve import Evolve
def benchmark(size=50000, n_cores=1):
with multiprocessing.Pool(n_cores) as pool:
t0 = time.perf_counter()
ibt = InitialBinaryTable.sampler(
'independent', [13, 14], [13, 14],
binfrac_model=0.5, primary_model='kroupa01',
ecc_model='sana12', porb_model='sana12',
qmin=-1, SF_start=13700.0, SF_duration=0.0,
met=0.02, size=size, pool=pool,
)[0]
print(f" Sampling {size} binaries with nproc={n_cores} took {time.perf_counter() - t0:.1f}s")
t0 = time.perf_counter()
Evolve.evolve(initialbinarytable=ibt, BSEDict=BSEDict, pool=pool)
elapsed = time.perf_counter() - t0
print(f" Evolving {size} binaries with nproc={n_cores} took {elapsed:.1f}s")
if __name__ == "__main__":
for size in [10, 1000]:
print(f"Benchmarking with population size {size}:")
for n in [1, 2, 4, 8, 16]:
benchmark(size=size, n_cores=n)
print()
Running this on your machine will show you the point of diminishing returns. You may find that 4 cores is nearly as fast as 16 for a small population, while for a large population 16 cores is much faster than 4.
Tip
For most research-quality simulations you’ll likely just want to use the maximum number of cores available, but benchmarking can be useful if you want to know how long a run will take or if you are running on a shared cluster where you have to request a specific number of cores and may want to avoid requesting more than you need.
Parallelising the independent sampler¶
Using nproc¶
The simplest way to parallelise sampling is to pass nproc directly to
sampler(). COSMIC splits the
requested sample into chunks, distributes them across nproc worker processes, and combines
the results.
from cosmic.sample.initialbinarytable import InitialBinaryTable
def main():
ibt, mass_singles, mass_binaries, n_singles, n_binaries = \
InitialBinaryTable.sampler(
'independent', [13, 14], [13, 14],
binfrac_model=0.5, primary_model='kroupa01',
ecc_model='sana12', porb_model='sana12',
qmin=-1, SF_start=13700.0, SF_duration=0.0,
met=0.02, size=100_000,
nproc=4,
)
print(ibt)
if __name__ == "__main__":
main()
COSMIC will only spawn as many chunks as make sense given the requested sample size — if the
sample is too small to benefit from parallelism it will silently cap the number of chunks, so
passing a large nproc on a small sample is harmless.
Using an existing pool¶
If you are already managing a multiprocessing.Pool — for example because you want to
reuse it across multiple sampling calls, or with evolution as well — you can pass it via the pool argument instead.
When a pool is supplied, the nproc argument is ignored.
from multiprocessing import Pool
from cosmic.sample.initialbinarytable import InitialBinaryTable
def main():
with Pool(4) as pool:
ibt, mass_singles, mass_binaries, n_singles, n_binaries = \
InitialBinaryTable.sampler(
'independent', [13, 14], [13, 14],
binfrac_model=0.5, primary_model='kroupa01',
ecc_model='sana12', porb_model='sana12',
qmin=-1, SF_start=13700.0, SF_duration=0.0,
met=0.02, size=100000,
pool=pool,
)
print(ibt)
if __name__ == "__main__":
main()
Parallelising evolution¶
Using nproc¶
evolve() accepts an nproc keyword argument. Each binary is
evolved independently, so the workload is split across nproc worker processes automatically.
In [20]: BSEDict = {
....: "pts1": 0.001, "pts2": 0.01, "pts3": 0.02, "zsun": 0.014, "windflag": 3,
....: "eddlimflag": 0, "neta": 0.5, "bwind": 0.0, "hewind": 0.5, "beta": 0.125,
....: "xi": 0.5, "acc2": 1.5, "LBV_flag": 1, "alpha1": 1.0, "lambdaf": 0.0,
....: "ceflag": 1, "cekickflag": 2, "cemergeflag": 1, "cehestarflag": 0,
....: "qcflag": 5,
....: "qcrit_array": [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0],
....: "kickflag": 5, "sigma": 265.0, "bhflag": 1, "bhsigmafrac": 1.0,
....: "sigmadiv": -20.0, "ecsn": 2.25, "ecsn_mlow": 1.6, "aic": 1, "ussn": 1,
....: "polar_kick_angle": 90.0,
....: "natal_kick_array": [[-100.0, -100.0, -100.0, -100.0, 0.0], [-100.0, -100.0, -100.0, -100.0, 0.0]],
....: "mm_mu_ns": 400.0, "mm_mu_bh": 200.0, "remnantflag": 4,
....: "fryer_mass_limit": 0, "mxns": 3.0, "rembar_massloss": 0.5,
....: "wd_mass_lim": 1, "maltsev_mode": 0, "maltsev_fallback": 0.5,
....: "maltsev_pf_prob": 0.1, "pisn": -2, "ppi_co_shift": 0.0,
....: "ppi_extra_ml": 0.0, "bhspinflag": 0, "bhspinmag": 0.0, "grflag": 1,
....: "eddfac": 10, "gamma": -2, "don_lim": -1, "acc_lim": -1, "tflag": 1,
....: "ST_tide": 1,
....: "fprimc_array": [2.0/21.0,2.0/21.0,2.0/21.0,2.0/21.0,2.0/21.0,2.0/21.0,2.0/21.0,2.0/21.0,2.0/21.0,2.0/21.0,2.0/21.0,2.0/21.0,2.0/21.0,2.0/21.0,2.0/21.0,2.0/21.0],
....: "ifflag": 1, "wdflag": 1, "epsnov": 0.001, "bdecayfac": 1,
....: "bconst": 3000, "ck": 1000, "rejuv_fac": 1.0, "rejuvflag": 0,
....: "bhms_coll_flag": 0, "htpmb": 1, "ST_cr": 1, "rtmsflag": 0
....: }
....:
from cosmic.sample.initialbinarytable import InitialBinaryTable
from cosmic.evolve import Evolve
def main():
ibt, mass_singles, mass_binaries, n_singles, n_binaries = \
InitialBinaryTable.sampler(
'independent', [13, 14], [13, 14],
binfrac_model=0.5, primary_model='kroupa01',
ecc_model='sana12', porb_model='sana12',
qmin=-1, SF_start=13700.0, SF_duration=0.0,
met=0.02, size=100,
)
bpp, bcm, initC, kick_info = Evolve.evolve(
initialbinarytable=ibt,
BSEDict=BSEDict,
nproc=4,
)
print(bpp)
if __name__ == "__main__":
main()
Using an existing pool¶
If you want to reuse the same pool for evolution — or share one across sampling and evolution —
pass the pool object as the pool argument. Like the sampler, evolve()
will use the pool but leave it open when it finishes.
from multiprocessing import Pool
from cosmic.sample.initialbinarytable import InitialBinaryTable
from cosmic.evolve import Evolve
def main():
with Pool(4) as pool:
ibt, mass_singles, mass_binaries, n_singles, n_binaries = \
InitialBinaryTable.sampler(
'independent', [13, 14], [13, 14],
binfrac_model=0.5, primary_model='kroupa01',
ecc_model='sana12', porb_model='sana12',
qmin=-1, SF_start=13700.0, SF_duration=0.0,
met=0.02, size=100,
pool=pool,
)
bpp, bcm, initC, kick_info = Evolve.evolve(
initialbinarytable=ibt,
BSEDict=BSEDict,
pool=pool,
)
print(bpp)
if __name__ == "__main__":
main()
Sharing a single pool avoids the overhead of spawning and tearing down worker processes twice, which can be a meaningful saving when running many iterations.
Tip
If you are running a parameter sweep and calling evolve() many
times with different physics settings, creating one pool at the start and passing it to each
call is more efficient than relying on nproc.
Wrapping up¶
Multiprocessing in COSMIC is basically just a matter of using nproc or passing a pool directly, but some
takeaways to keep in mind are:
Use
nprocfor the simplest case — COSMIC handles pool creation and cleanup automatically.Use an existing
poolwhen you want fine-grained control over worker processes or need to share the same pool across sampling and evolution.Both approaches work identically for the independent sampler and for
evolve().Always define a
main()function and call it underif __name__ == "__main__":to avoid issues with Python’s multiprocessing start methods on macOS and Windows.Benchmark before scaling up — more cores only helps once your population is large enough to outweigh the overhead of spawning workers.