Transfer data¶
This guide shows how to transfer data from a source database instance into the current default database instance.
# !pip install 'lamindb[jupyter,aws,bionty]'
!lamin init --storage ./test-transfer --schema bionty
Show code cell output
→ connected lamindb: anonymous/test-transfer
import lamindb as ln
ln.track("ITeOtm7bhtdq0000")
Show code cell output
→ connected lamindb: anonymous/test-transfer
→ notebook imports: lamindb==0.76.15
→ created Transform('ITeOtm7b'), started new Run('dLHC4Mb8') at 2024-11-05 13:03:23 UTC
Query all artifacts in the laminlabs/lamindata
instance and filter them to their latest versions.
# query all latest artifact versions
artifacts = ln.Artifact.using("laminlabs/lamindata").filter(is_latest=True)
# convert the QuerySet to a DataFrame and show the latest 5 versions
artifacts.df().head()
Show code cell output
! source schema has additional modules: {'wetlab'}
consider mounting these schema modules to transfer all metadata
uid | version | is_latest | description | key | suffix | type | size | hash | n_objects | n_observations | _hash_type | _accessor | visibility | _key_is_virtual | storage_id | transform_id | run_id | created_at | created_by_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||||||||
939 | 8IqP9rziJg1XHlLC0000 | None | True | requirements.txt | None | .txt | None | 11917 | wpKsUh6VC3a6fLA3yXoK7A | NaN | NaN | md5 | None | 0 | True | 2 | NaN | NaN | 2024-10-18 21:58:29.619646+00:00 | 9 |
940 | fsRsnfvzoVdm1LCq0000 | None | True | Report of run LIovDODHNelubirtzOq1 | None | .html | None | 340579 | 3sCCDozNGzYj3pO0ft-h3g | NaN | NaN | md5 | None | 0 | True | 2 | NaN | NaN | 2024-10-18 21:58:30.765161+00:00 | 9 |
607 | sRapK07mMtToihzFeTaf | None | True | View Papalexi21 in Vitessce | None | .vitessce.json | None | 1527 | jfAtjNNzdvetUaEo5zhf0Q | NaN | NaN | md5 | None | 1 | True | 2 | 79.0 | 141.0 | 2024-04-30 12:51:16.348884+00:00 | 2 |
726 | HXJ4DDAw8012jVKwoxgd | None | True | View Kuppe2022 in Vitessce | None | .vitessce.json | None | 5258 | JsVK8X8EGRsyTEMnD3Z-6g | NaN | NaN | md5 | None | 1 | True | 2 | 79.0 | 198.0 | 2024-06-26 10:35:31.697669+00:00 | 2 |
870 | JD7j66Cr6CDUGK8X0000 | None | True | requirements.txt | None | .txt | None | 8163 | iWmRrdYRJC7zjygVXCvMBA | NaN | NaN | md5 | None | 0 | True | 2 | NaN | NaN | 2024-09-25 12:37:33.582463+00:00 | 6 |
You can now further subset or search the QuerySet
. Here we query by whether the description contains “tabula sapiens”.
artifact = artifacts.filter(description__contains="Tabula Sapiens").first()
artifact.describe()
Show code cell output
Artifact(uid='dPraor9rU1EofcFb6Wph', is_latest=True, description='Part of Tabula Sapiens, a benchmark, first-draft human cell atlas.', key='tabula_sapiens_lung.h5ad', suffix='.h5ad', size=3899435772, hash='8mB1KK2wd51F6HQdvqipcQ', _hash_type='sha1-fl', visibility=1, _key_is_virtual=False, created_at=2023-07-14 19:00:30 UTC)
Database instance
slug: laminlabs/lamindata
Provenance
.storage = 's3://lamindata'
.transform = 'Ingest Tabula Sapiens Lung'
.run = '2023-07-14 12:53:17 UTC'
.created_by = 'Koncopd'
Usage
.input_of_runs = 2023-07-15 17:12:16 UTC
Labels
.tissues = 'lung'
.cell_types = 'CD4-positive, alpha-beta T cell', 'CD8-positive, alpha-beta T cell', 'dendritic cell', 'B cell', 'fibroblast', 'non-classical monocyte', 'myofibroblast cell', 'capillary endothelial cell', 'vein endothelial cell', 'endothelial cell of lymphatic vessel', ...
.experimental_factors = 'anoxya', 'stroke'
.ulabels = 'TSP1', 'TSP2', 'TSP14'
By saving the artifact record that’s currently attached to the source database instance, you transfer it to the default database instance.
artifact.save()
Show code cell output
→ mapped records: Tissue(uid='7Tt4iEKc'), CellType(uid='5tiBvp96'), CellType(uid='7Crr32HI'), CellType(uid='6dzoXJ3Y'), CellType(uid='01NqvhnI'), CellType(uid='5NceZTYm'), CellType(uid='4PSMdO3I'), CellType(uid='3JO0EdVd'), CellType(uid='6rfrjhvo'), CellType(uid='37mWPv6o'), CellType(uid='5Z76sCep'), CellType(uid='2OWUH6Z1'), CellType(uid='5TU8SFt5'), CellType(uid='ryEtgi1y'), CellType(uid='1lMgAPE8'), CellType(uid='7m6Ruz32'), CellType(uid='42qbvc90'), CellType(uid='puGNwNrs'), CellType(uid='1T8bGe2I'), CellType(uid='6IC9NGJE'), CellType(uid='6ujMwy7s'), CellType(uid='3eecYgWR'), CellType(uid='zQ4dyjEs'), CellType(uid='7mNqzyFE'), CellType(uid='5A9EFjNB'), CellType(uid='3lsrLTv6'), CellType(uid='1HYtHpIc'), CellType(uid='6UmKFrzn'), CellType(uid='7eZArDpo'), CellType(uid='2KCFdGIk'), CellType(uid='1V5wVqK5'), CellType(uid='5i19XYug'), CellType(uid='2nPA0h4F'), CellType(uid='5Xi2OLvZ'), CellType(uid='3kaL3W1c'), ExperimentalFactor(uid='5YDCOg0V'), ExperimentalFactor(uid='7R1OhRJ7')
→ transferred records: Artifact(uid='dPraor9rU1EofcFb6Wph'), Storage(uid='D9BilDV2'), CellType(uid='4mZaXZQg'), CellType(uid='5rVn0X39'), CellType(uid='EWy46Sey'), CellType(uid='4yqLzwwm'), ULabel(uid='vfLXaHgD'), ULabel(uid='gk6w8qC5'), ULabel(uid='tZCTk48f')
Artifact(uid='dPraor9rU1EofcFb6Wph', is_latest=True, description='Part of Tabula Sapiens, a benchmark, first-draft human cell atlas.', key='tabula_sapiens_lung.h5ad', suffix='.h5ad', size=3899435772, hash='8mB1KK2wd51F6HQdvqipcQ', _hash_type='sha1-fl', visibility=1, _key_is_virtual=False, storage_id=2, transform_id=2, run_id=2, created_by_id=1, created_at=2024-11-05 13:03:25 UTC)
How do I know if a record is saved in the default database instance or not?
Every record has an attribute ._state.db
which can take the following values:
None
: the record has not yet been saved to any database"default"
: the record is saved on the default database instance"account/name"
: the record is save on a non-default database instance referenced byaccount/name
(e.g.,laminlabs/lamindata
)
The artifact record and all other feature & label records have been transferred to the current database.
artifact.describe()
Show code cell output
Artifact(uid='dPraor9rU1EofcFb6Wph', is_latest=True, description='Part of Tabula Sapiens, a benchmark, first-draft human cell atlas.', key='tabula_sapiens_lung.h5ad', suffix='.h5ad', size=3899435772, hash='8mB1KK2wd51F6HQdvqipcQ', _hash_type='sha1-fl', visibility=1, _key_is_virtual=False, created_at=2024-11-05 13:03:25 UTC)
Provenance
.storage = 's3://lamindata'
.transform = 'Transfer from `laminlabs/lamindata`'
.run = 2024-11-05 13:03:25 UTC
.created_by = 'anonymous'
Labels
.tissues = 'lung'
.cell_types = 'type I pneumocyte', 'adventitial cell', 'basal cell', 'non-classical monocyte', 'smooth muscle cell', 'CD4-positive, alpha-beta T cell', 'plasmacytoid dendritic cell', 'neutrophil', 'natural killer cell', 'myofibroblast cell', ...
.experimental_factors = 'anoxya', 'stroke'
.ulabels = 'TSP1', 'TSP2', 'TSP14'
You see that the data itself remained in the original storage location, which has been added to the current instance’s storage location as a read-only location.
ln.Storage.df()
Show code cell output
uid | root | description | type | region | instance_uid | run_id | created_at | created_by_id | |
---|---|---|---|---|---|---|---|---|---|
id | |||||||||
2 | D9BilDV2 | s3://lamindata | None | s3 | us-east-1 | 4XIuR0tvaiXM | 2.0 | 2024-11-05 13:03:25.806435+00:00 | 1 |
1 | 5s4VrKhhJtvU | /home/runner/work/lamindb/lamindb/docs/test-tr... | None | local | None | 1FHu5eE0uxm4 | NaN | 2024-11-05 13:03:15.953655+00:00 | 1 |
See the state of the database.
ln.view()
Show code cell output
****************
* module: core *
****************
Artifact
uid | version | is_latest | description | key | suffix | type | size | hash | n_objects | n_observations | _hash_type | _accessor | visibility | _key_is_virtual | storage_id | transform_id | run_id | created_at | created_by_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||||||||
1 | dPraor9rU1EofcFb6Wph | None | True | Part of Tabula Sapiens, a benchmark, first-dra... | tabula_sapiens_lung.h5ad | .h5ad | None | 3899435772 | 8mB1KK2wd51F6HQdvqipcQ | None | None | sha1-fl | None | 1 | False | 2 | 2 | 2 | 2024-11-05 13:03:25.808444+00:00 | 1 |
! No records found
! No records found
! No records found
Run
uid | started_at | finished_at | is_consecutive | reference | reference_type | transform_id | report_id | environment_id | parent_id | created_at | created_by_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||
1 | dLHC4Mb8gvDbiOkOjYWz | 2024-11-05 13:03:23.518990+00:00 | None | True | None | None | 1 | None | None | NaN | 2024-11-05 13:03:23.519062+00:00 | 1 |
2 | nBwujIvAvV2mlyUNHtYl | 2024-11-05 13:03:25.794988+00:00 | None | None | None | None | 2 | None | None | 1.0 | 2024-11-05 13:03:25.795050+00:00 | 1 |
Storage
uid | root | description | type | region | instance_uid | run_id | created_at | created_by_id | |
---|---|---|---|---|---|---|---|---|---|
id | |||||||||
2 | D9BilDV2 | s3://lamindata | None | s3 | us-east-1 | 4XIuR0tvaiXM | 2.0 | 2024-11-05 13:03:25.806435+00:00 | 1 |
1 | 5s4VrKhhJtvU | /home/runner/work/lamindb/lamindb/docs/test-tr... | None | local | None | 1FHu5eE0uxm4 | NaN | 2024-11-05 13:03:15.953655+00:00 | 1 |
Transform
uid | version | is_latest | name | key | description | type | source_code | hash | reference | reference_type | _source_code_artifact_id | created_at | created_by_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||
2 | 4XIuR0tvaiXM0000 | None | True | Transfer from `laminlabs/lamindata` | transfers/4XIuR0tvaiXM | None | function | None | None | None | None | None | 2024-11-05 13:03:25.791597+00:00 | 1 |
1 | ITeOtm7bhtdq0000 | None | True | Transfer data | transfer.ipynb | None | notebook | None | None | None | None | None | 2024-11-05 13:03:23.513694+00:00 | 1 |
ULabel
uid | name | description | reference | reference_type | run_id | created_at | created_by_id | |
---|---|---|---|---|---|---|---|---|
id | ||||||||
3 | tZCTk48f | TSP14 | None | None | None | 2 | 2024-11-05 13:03:29.082986+00:00 | 1 |
2 | gk6w8qC5 | TSP2 | None | None | None | 2 | 2024-11-05 13:03:29.071584+00:00 | 1 |
1 | vfLXaHgD | TSP1 | None | None | None | 2 | 2024-11-05 13:03:29.061398+00:00 | 1 |
User
uid | handle | name | created_at | |
---|---|---|---|---|
id | ||||
1 | 00000000 | anonymous | None | 2024-11-05 13:03:15.949487+00:00 |
******************
* module: bionty *
******************
View lineage:
artifact.view_lineage()
The transferred dataset is linked to a special type of transform that stores the slug and uid of the source instance:
artifact.transform.name
'Transfer from `laminlabs/lamindata`'
The transform key has shape f"transfers/{source_instance.uid}"
:
artifact.transform.key
'transfers/4XIuR0tvaiXM'
The current notebook run is linked as the parent of the “transfer run”:
artifact.run.parent.transform
Transform(uid='ITeOtm7bhtdq0000', is_latest=True, name='Transfer data', key='transfer.ipynb', type='notebook', created_by_id=1, created_at=2024-11-05 13:03:23 UTC)
Show code cell content
# test the last 3 cells here
assert artifact.transform.name == "Transfer from `laminlabs/lamindata`"
assert artifact.transform.key == "transfers/4XIuR0tvaiXM"
assert artifact.transform.uid == "4XIuR0tvaiXM0000"
assert artifact.run.parent.transform.name == "Transfer data"
# clean up test instance
!lamin delete --force test-transfer
! calling anonymously, will miss private instances
• deleting instance anonymous/test-transfer