Skip to content

Fix: curl manifest futilely includes external LungMAP files (#8027)#8030

Open
nadove-ucsc wants to merge 9 commits into
developfrom
issues/nadove-ucsc/8027-curl-manifest-includes-external-lungmap-files
Open

Fix: curl manifest futilely includes external LungMAP files (#8027)#8030
nadove-ucsc wants to merge 9 commits into
developfrom
issues/nadove-ucsc/8027-curl-manifest-includes-external-lungmap-files

Conversation

@nadove-ucsc

@nadove-ucsc nadove-ucsc commented May 15, 2026

Copy link
Copy Markdown
Contributor

Linked issues: #8027

Checklist

Author

  • PR is assigned to the author
  • Status of PR is In progress
  • PR is a draft
  • Target branch is develop
  • Name of PR branch matches issues/<GitHub handle of author>/<issue#>-<slug>
  • PR is linked to all issues it (partially) resolves
  • Status of linked issues is In progress
  • PR description links to linked issues
  • PR title matches1 that of a linked issue or comment in PR explains why they're different
  • PR title references all linked issues
  • For each linked issue, there is at least one commit whose title references that issue

1 when the issue title describes a problem, the corresponding PR
title is Fix: followed by the issue title

Author (partiality)

  • Added p tag to titles of partial commits
  • This PR is labeled partial or completely resolves all linked issues
  • This PR partially resolves each of the linked issues or does not have the partial label

Author (reindex)

  • Added r tag to commit title or the changes introduced by this PR will not require reindexing of any deployment
  • This PR is labeled reindex:dev or the changes introduced by it will not require reindexing of dev
  • This PR is labeled reindex:anvildev or the changes introduced by it will not require reindexing of anvildev
  • This PR is labeled reindex:anvilprod or the changes introduced by it will not require reindexing of anvilprod
  • This PR is labeled reindex:prod or the changes introduced by it will not require reindexing of prod
  • This PR is labeled reindex:partial and its description documents the specific reindexing procedure for dev, anvildev, anvilprod and prod or requires a full reindex or carries none of the labels reindex:dev, reindex:anvildev, reindex:anvilprod and reindex:prod

Author (mirror)

  • This PR is labeled mirror:dev or the changes introduced by it will not require mirroring of dev
  • This PR is labeled mirror:anvildev or the changes introduced by it will not require mirroring of anvildev
  • This PR is labeled mirror:anvilprod or the changes introduced by it will not require mirroring of anvilprod
  • This PR is labeled mirror:prod or the changes introduced by it will not require mirroring of prod
  • This PR is labeled mirror:partial and its description documents the specific mirroring procedure for dev, anvildev, anvilprod and prod or requires a full mirroring or carries none of the labels mirror:dev, mirror:anvildev, mirror:anvilprod and mirror:prod

Author (API changes)

  • This PR and its linked issues are labeled API or this PR does not modify a REST API
  • Added a (A) tag to commit title for backwards (in)compatible changes or this PR does not modify a REST API
  • Updated REST API version number in app.py or this PR does not modify a REST API

Author (upgrading deployments)

  • Ran make docker_images.json and committed the resulting changes or this PR does not modify azul_docker_images, or any other variables referenced in the definition of that variable
  • Documented upgrading of deployments in UPGRADING.rst or this PR does not require upgrading deployments
  • Added u tag to commit title or this PR does not require upgrading deployments
  • This PR is labeled upgrade or does not require upgrading deployments
  • This PR is labeled deploy:shared or does not modify docker_images.json, and does not require deploying the shared component for any other reason
  • This PR is labeled deploy:gitlab or does not require deploying the gitlab component
  • This PR is labeled deploy:runner or does not require deploying the runner image

Author (hotfixes)

  • Added F tag to main commit title or this PR does not include permanent fix for a temporary hotfix
  • Reverted the temporary hotfixes for any linked issues or the none of the stable branches (anvilprod and prod) have temporary hotfixes for any of the issues linked to this PR

Author (before every review)

  • Rebased PR branch on develop, squashed fixups from prior reviews
  • Ran make requirements_update or this PR does not modify Dockerfile, environment, requirements*.txt, common.mk, Makefile or environment.boot
  • Added R tag to commit title or this PR does not modify requirements*.txt
  • This PR is labeled reqs or does not modify requirements*.txt
  • make integration_test passes in personal deployment or this PR does not modify functionality that could affect the IT outcome
  • PR is awaiting requested review from a peer
  • Status of PR is Review requested
  • PR is assigned to only the peer and the author

Peer reviewer (after approval)

Note that after requesting changes, the PR must be assigned to only the author.

  • Actually approved the PR
  • PR is not a draft
  • PR is awaiting requested review from system administrator
  • Status of PR is Review requested
  • PR is assigned to only the system administrator and the author

System administrator (after approval)

  • Actually approved the PR
  • Labeled linked issues as demo or no demo
  • Commented on linked issues about demo expectations or all linked issues are labeled no demo
  • Decided if PR can be labeled no sandbox
  • A comment to this PR details the completed security design review
  • PR title is appropriate as title of merge commit
  • N reviews label is accurate
  • Status of PR is Approved
  • PR is assigned to only the operator and the author

Operator

  • Checked reindex:… labels and r commit title tag
  • Checked mirror:… labels
  • Checked that demo expectations are clear or all linked issues are labeled no demo
  • Squashed PR branch and rebased onto develop
  • Sanity-checked history
  • Pushed PR branch to GitHub

Operator (deploy .shared and .gitlab components)

  • Ran _select dev.shared && CI_COMMIT_REF_NAME=develop make -C terraform/shared apply_keep_unused or this PR is not labeled deploy:shared
  • Ran _select dev.gitlab && CI_COMMIT_REF_NAME=develop make -C terraform/gitlab apply or this PR is not labeled deploy:gitlab
  • Ran _select anvildev.shared && CI_COMMIT_REF_NAME=develop make -C terraform/shared apply_keep_unused or this PR is not labeled deploy:shared
  • Ran _select anvildev.gitlab && CI_COMMIT_REF_NAME=develop make -C terraform/gitlab apply or this PR is not labeled deploy:gitlab
  • Checked the items in the next section or this PR is labeled deploy:gitlab
  • PR is assigned to only the system administrator and the author or this PR is not labeled deploy:gitlab

System administrator (post-deploy of .gitlab component)

  • Background migrations for dev.gitlab are complete or this PR is not labeled deploy:gitlab
  • Background migrations for anvildev.gitlab are complete or this PR is not labeled deploy:gitlab
  • PR is assigned to only the operator and the author

Operator (deploy runner image)

  • Ran _select dev.gitlab && make -C terraform/gitlab/runner or this PR is not labeled deploy:runner
  • Ran _select anvildev.gitlab && make -C terraform/gitlab/runner or this PR is not labeled deploy:runner

Operator (sandbox build)

  • Added sandbox label or PR is labeled no sandbox
  • Pushed PR branch to GitLab dev or PR is labeled no sandbox
  • Pushed PR branch to GitLab anvildev or PR is labeled no sandbox
  • Build passes in sandbox deployment or PR is labeled no sandbox
  • Build passes in anvilbox deployment or PR is labeled no sandbox
  • Reviewed build logs for anomalies in sandbox deployment or PR is labeled no sandbox
  • Reviewed build logs for anomalies in anvilbox deployment or PR is labeled no sandbox
  • Deleted unreferenced indices in sandbox or this PR does not remove catalogs or otherwise causes unreferenced indices in sandbox
  • Deleted unreferenced indices in anvilbox or this PR does not remove catalogs or otherwise causes unreferenced indices in anvilbox
  • Started reindex in sandbox or this PR is not labeled reindex:dev
  • Started reindex in anvilbox or this PR is not labeled reindex:anvildev
  • Checked for failures in sandbox or this PR is not labeled reindex:dev
  • Checked for failures in anvilbox or this PR is not labeled reindex:anvildev
  • Started mirroring in sandbox or this PR is not labeled mirror:dev
  • Started mirroring in anvilbox or this PR is not labeled mirror:anvildev
  • Checked for failures in sandbox or this PR is not labeled mirror:dev
  • Checked for failures in anvilbox or this PR is not labeled mirror:anvildev

Operator (merge the branch)

  • All status checks passed and the PR is mergeable
  • The title of the merge commit starts with the title of this PR
  • Added PR # reference to merge commit title
  • Collected commit title tags in merge commit title but only included p if the PR is also labeled partial
  • Pushed merge commit to GitHub
  • Status of PR is Merged lower
  • Status of blocked issues is Triage or no issues are blocked on the linked issues

Operator (main build)

  • Pushed merge commit to GitLab dev
  • Pushed merge commit to GitLab anvildev
  • Build passes on GitLab dev
  • Reviewed build logs for anomalies on GitLab dev
  • Build passes on GitLab anvildev
  • Reviewed build logs for anomalies on GitLab anvildev
  • Ran _select dev.shared && make -C terraform/shared apply or this PR is not labeled deploy:shared
  • Ran _select anvildev.shared && make -C terraform/shared apply or this PR is not labeled deploy:shared
  • Deleted PR branch from GitHub
  • PR is assigned to only the operator
  • Deleted PR branch from GitLab dev
  • Deleted PR branch from GitLab anvildev
  • Status of linked issues is Lower, or Triage, if PR is partial

Operator (reindex)

  • Deindexed all unreferenced catalogs in dev or this PR is neither labeled reindex:partial nor reindex:dev
  • Deindexed all unreferenced catalogs in anvildev or this PR is neither labeled reindex:partial nor reindex:anvildev
  • Deindexed specific sources in dev or this PR is neither labeled reindex:partial nor reindex:dev
  • Deindexed specific sources in anvildev or this PR is neither labeled reindex:partial nor reindex:anvildev
  • Indexed specific sources in dev or this PR is neither labeled reindex:partial nor reindex:dev
  • Indexed specific sources in anvildev or this PR is neither labeled reindex:partial nor reindex:anvildev
  • Started reindex in dev or this PR does not require reindexing dev
  • Started reindex in anvildev or this PR does not require reindexing anvildev
  • Checked for, triaged and possibly requeued messages in both fail queues in dev or this PR does not require reindexing dev
  • Checked for, triaged and possibly requeued messages in both fail queues in anvildev or this PR does not require reindexing anvildev
  • Emptied fail queues in dev or this PR does not require reindexing dev
  • Emptied fail queues in anvildev or this PR does not require reindexing anvildev
  • Restarted the Data Browser pipeline for the ucsc/hca/dev branch on GitLab in dev or this PR does not require reindexing dev
  • Restarted the Data Browser pipeline for the ucsc/lungmap/dev branch on GitLab in dev or this PR does not require reindexing dev
  • Restarted deploy_browser job in the GitLab pipeline for this PR in dev or this PR does not require reindexing dev
  • Restarted the Data Browser pipeline for the ucsc/anvil/anvildev branch on GitLab in anvildev or this PR does not require reindexing anvildev
  • Restarted deploy_browser job in the GitLab pipeline for this PR in anvildev or this PR does not require reindexing anvildev

Operator (mirroring)

  • Started mirroring in dev or this PR is not labelled mirror:dev
  • Started mirroring in anvildev or this PR is not labelled mirror:anvildev
  • Checked for, triaged and possibly requeued messages in mirror fail queue in dev or this PR is not labelled mirror:dev
  • Checked for, triaged and possibly requeued messages in mirror fail queue in anvildev or this PR is not labelled mirror:anvildev
  • Emptied mirror fail queue in dev or this PR is not labelled mirror:dev
  • Emptied mirror fail queue in anvildev or this PR is not labelled mirror:anvildev

Operator

  • Propagated the deploy:shared, deploy:gitlab, deploy:runner, API, reindex:partial, reindex:anvilprod, reindex:prod, mirror:partial, mirror:anvilprod and mirror:prod labels to the next promotion PRs or this PR carries none of these labels
  • Propagated any specific instructions related to the deploy:shared, deploy:gitlab, deploy:runner, API, reindex:partial, reindex:anvilprod, reindex:prod, mirror:partial, mirror:anvilprod and mirror:prod labels, from the description of this PR to that of the next promotion PRs or this PR carries none of these labels
  • PR is assigned to no one

Shorthand for review comments

  • L line is too long
  • W line wrapping is wrong
  • Q bad quotes
  • F other formatting problem

@nadove-ucsc nadove-ucsc self-assigned this May 15, 2026
@nadove-ucsc nadove-ucsc linked an issue May 15, 2026 that may be closed by this pull request
@nadove-ucsc nadove-ucsc force-pushed the issues/nadove-ucsc/8027-curl-manifest-includes-external-lungmap-files branch from a70d2de to 9e2cb1f Compare May 15, 2026 22:18
Comment thread src/azul/service/index_service.py Fixed
@nadove-ucsc nadove-ucsc force-pushed the issues/nadove-ucsc/8027-curl-manifest-includes-external-lungmap-files branch 3 times, most recently from 3194880 to e012e86 Compare May 15, 2026 23:04
@coveralls

coveralls commented May 15, 2026

Copy link
Copy Markdown

Coverage Status

coverage: 84.932% (+0.001%) from 84.931% — issues/nadove-ucsc/8027-curl-manifest-includes-external-lungmap-files into develop

@nadove-ucsc nadove-ucsc force-pushed the issues/nadove-ucsc/8027-curl-manifest-includes-external-lungmap-files branch 2 times, most recently from 4648e95 to ace1dd9 Compare May 18, 2026 01:19
@codecov

codecov Bot commented May 18, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 97.87234% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 84.87%. Comparing base (204945e) to head (8c34132).

Files with missing lines Patch % Lines
src/azul/service/manifest_service.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #8030      +/-   ##
===========================================
+ Coverage    84.85%   84.87%   +0.01%     
===========================================
  Files          165      165              
  Lines        24177    24206      +29     
===========================================
+ Hits         20515    20544      +29     
  Misses        3662     3662              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@achave11-ucsc achave11-ucsc left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To confirm, these changes do not require re-indexing?

Also, consider adding a (unit/integration) test that covers the new elif branch.

Comment thread src/azul/service/index_service.py
Comment thread src/azul/service/query_service.py Outdated


@attr.s(auto_attribs=True, kw_only=True, frozen=True)
@attrs.frozen(auto_attribs=True, frozen=True)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems frozen is redundant and kw_only was dropped.

Suggested change
@attrs.frozen(auto_attribs=True, frozen=True)
@attrs.frozen(auto_attribs=True, kw_only=True)

Comment thread src/azul/service/query_service.py Outdated
config.catalogs[catalog].atlas == 'lungmap'
and drs_uri.startswith('drs://dg.4503:')
):
# Lungmap contains external files hosted on BioDataCatalyst.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Lungmap contains external files hosted on BioDataCatalyst.
# LungMAP contains external files hosted on BioDataCatalyst.

Comment thread src/azul/service/index_service.py Outdated
@cache
def mirror_service(self, catalog: CatalogName) -> MirrorService:
return MirrorService(catalog=catalog)
@attrs.frozen(kw_only=True)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like the sibling class definitions in query_service.py?

Suggested change
@attrs.frozen(kw_only=True)
@attrs.frozen(auto_attribs=True, kw_only=True)

@achave11-ucsc achave11-ucsc removed their assignment May 19, 2026
@nadove-ucsc

Copy link
Copy Markdown
Contributor Author

To confirm, these changes do not require re-indexing?

Confirmed. The affected fields in the response are synthesized by the service and are not present in the index.

@nadove-ucsc nadove-ucsc force-pushed the issues/nadove-ucsc/8027-curl-manifest-includes-external-lungmap-files branch 2 times, most recently from 4fb3f6f to 275c9bc Compare May 19, 2026 02:06
@nadove-ucsc nadove-ucsc requested a review from achave11-ucsc May 19, 2026 02:31
achave11-ucsc
achave11-ucsc previously approved these changes May 19, 2026

@achave11-ucsc achave11-ucsc left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved ✅

@achave11-ucsc achave11-ucsc marked this pull request as ready for review May 19, 2026 04:49

@hannes-ucsc hannes-ucsc left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the FileUrlService refactoring and only fix the issue at hand, plus the test. Leave the attrs conversion, but make sure it is done correctly (see comment below).

Please clear these types of refactorings with me before embarking on them. Feel free to create an issue for the refactoring and motivate the refactoring in the description of that issue.

Comment thread src/azul/service/index_service.py Outdated
@cache
def mirror_service(self, catalog: CatalogName) -> MirrorService:
return MirrorService(catalog=catalog)
@attrs.frozen(auto_attribs=True, kw_only=True)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@attrs.frozen(auto_attribs=True, kw_only=True)
@attrs.frozen(kw_only=True)

The new API infers it automatically.

@hannes-ucsc hannes-ucsc removed their assignment May 19, 2026
@hannes-ucsc hannes-ucsc added the 1 review [process] Lead requested changes once label May 19, 2026
@nadove-ucsc nadove-ucsc force-pushed the issues/nadove-ucsc/8027-curl-manifest-includes-external-lungmap-files branch from 40836f1 to 71ca313 Compare May 21, 2026 06:31
@hannes-ucsc hannes-ucsc added 3 reviews [process] Lead requested changes thrice and removed 2 reviews [process] Lead requested changes twice labels May 22, 2026
@hannes-ucsc hannes-ucsc removed their assignment May 22, 2026
@nadove-ucsc nadove-ucsc force-pushed the issues/nadove-ucsc/8027-curl-manifest-includes-external-lungmap-files branch from bd29c06 to 26cfd2d Compare May 22, 2026 18:35
@nadove-ucsc nadove-ucsc requested a review from hannes-ucsc May 22, 2026 18:48

@hannes-ucsc hannes-ucsc left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new test is a bit lazy. It indexes two bundles, even though only one file in one bundle is patched. The same bundles were indexed already in a different test. The need for extracting a superclass stems from the fact that these test cases index their bundles during class setup, so the patching needs to be done at class setup, and therefore a new class is needed.

We'd get more bang for the buck if actually canned two LungMAP bundles, one from a project with compact DRS UIRs, and one from a project with host-based DRS URIS. We wouldn't extend the test duration but we would get more meaningful coverage.

Also, I'd like to the main commit to be split and pushed individually so that I can see the test fail.

Comment thread test/service/test_response.py Outdated
}

external_file_uuid = '27fc1a2e-d70e-47ee-a4b7-92bf57e5b7a6'
# Compact identifier for BioDataCatalyst

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Compact identifier for BioDataCatalyst

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The need for extracting a superclass stems from the fact that these test cases...

The need for extracting a superclass stems from the fact that the new test uses a different catalog configuration due to requiring the Config.Catalog.atlas property to be set to 'lungmap' instead of 'hca', regardless of how/which bundles are indexed.

@nadove-ucsc nadove-ucsc May 28, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bundle naturally contains a mixture of host-based and compact DRS URIs, so I just canned that one.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The need for extracting a superclass stems from the fact that these test cases...

The need for extracting a superclass stems from the fact that the new test uses a different catalog configuration due to requiring the Config.Catalog.atlas property to be set to 'lungmap' instead of 'hca', regardless of how/which bundles are indexed.

I don't see how you would be able to avoid the superclass and new subclass, even if you were able to patch the catalog per test method. I think both reasons are valid and I didn't claim that the reason I gave was exclusive.

@hannes-ucsc hannes-ucsc removed their assignment May 23, 2026
@nadove-ucsc nadove-ucsc force-pushed the issues/nadove-ucsc/8027-curl-manifest-includes-external-lungmap-files branch 2 times, most recently from f9c1dee to 0699599 Compare May 28, 2026 05:05
@nadove-ucsc nadove-ucsc requested a review from hannes-ucsc May 28, 2026 05:25

@hannes-ucsc hannes-ucsc left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bundle you canned is pretty large. Make sure you can the smallest possible bundle. I don't mind indexing two bundles if you can't find a small bundle that has both types of DRS URIs. Document your process of finding the bundles to can.

Index: test/service/test_response.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/test/service/test_response.py b/test/service/test_response.py
--- a/test/service/test_response.py	(revision 5fd60826db7275ef02493d217ca1fdc2b95287ae)
+++ b/test/service/test_response.py	(date 1780014578969)
@@ -3772,7 +3772,7 @@
         return one(response.json()['files'])
 
 
-class TestResponseWithDCP2Cans(DCP2ResponseTestCase):
+class TestResponseWithHCADCP2Cans(DCP2ResponseTestCase):
 
     @classmethod
     def bundles(cls) -> list[SourcedBundleFQID]:
@@ -3864,7 +3864,7 @@
         self.assertEqual(expected_tree, project['contributedAnalyses'])
 
 
-class TestExternalLungmapFiles(LungmapTestCase, DCP2ResponseTestCase):
+class TestResponseWithLungmapCans(LungmapTestCase, DCP2ResponseTestCase):
 
     @classmethod
     def bundles(cls) -> list[SourcedBundleFQID]:
@@ -3873,20 +3873,19 @@
                             version='2024-06-25T15:12:06.304736Z'),
         ]
 
-    def test_file_url(self):
-        with self.subTest('host-based DRS URI'):
-            file = self.get_file('005f71ed-fcba-4026-bc97-a8c9707f26ee')
-            self.assertEqual('drs://jade.datarepo-dev.broadinstitute.org/v2_85c93a64-a9d0-4331-ad36-53a6498e57fc',
-                             file['drs_uri'])
-            expected_file_url = str(self.base_url.set(
-                path='/repository/files/005f71ed-fcba-4026-bc97-a8c9707f26ee',
-                args=dict(catalog=self.catalog,
-                          version='2023-10-31T16:42:42.419707Z')
-            ))
-            self.assertEqual(expected_file_url, file['azul_url'])
+    def test_file_with_host_based_drs_uri(self):
+        file = self.get_file('005f71ed-fcba-4026-bc97-a8c9707f26ee')
+        self.assertEqual('drs://jade.datarepo-dev.broadinstitute.org/v2_85c93a64-a9d0-4331-ad36-53a6498e57fc',
+                         file['drs_uri'])
+        expected_file_url = str(self.base_url.set(
+            path='/repository/files/005f71ed-fcba-4026-bc97-a8c9707f26ee',
+            args=dict(catalog=self.catalog,
+                      version='2023-10-31T16:42:42.419707Z')
+        ))
+        self.assertEqual(expected_file_url, file['azul_url'])
 
-        with self.subTest('compact DRS URI'):
-            file = self.get_file('002ebcd6-722d-434d-b21b-13a06e659a67')
-            self.assertEqual('drs://dg.4503:696f302b-ecd9-4bcc-a750-56c22b496f02',
-                             file['drs_uri'])
-            self.assertIsNone(file['azul_url'])
+    def test_file_with_compact_drs_uri(self):
+        file = self.get_file('002ebcd6-722d-434d-b21b-13a06e659a67')
+        self.assertEqual('drs://dg.4503:696f302b-ecd9-4bcc-a750-56c22b496f02',
+                         file['drs_uri'])
+        self.assertIsNone(file['azul_url'])

Comment thread src/azul/service/index_service.py Outdated
config.catalogs[self.catalog].atlas == 'lungmap'
and isinstance(DRSURI.parse(drs_uri), CompactDRSURI)
):
# LungMAP contains external files hosted on BioDataCatalyst.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# LungMAP contains external files hosted on BioDataCatalyst.
# LungMAP contains files not hosted on TDR.

… and remove other mentions of BDC you may have added.

Comment thread test/service/test_response.py Outdated
}

external_file_uuid = '27fc1a2e-d70e-47ee-a4b7-92bf57e5b7a6'
# Compact identifier for BioDataCatalyst

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The need for extracting a superclass stems from the fact that these test cases...

The need for extracting a superclass stems from the fact that the new test uses a different catalog configuration due to requiring the Config.Catalog.atlas property to be set to 'lungmap' instead of 'hca', regardless of how/which bundles are indexed.

I don't see how you would be able to avoid the superclass and new subclass, even if you were able to patch the catalog per test method. I think both reasons are valid and I didn't claim that the reason I gave was exclusive.

@nadove-ucsc

Copy link
Copy Markdown
Contributor Author

The bundle you canned is pretty large. Make sure you can the smallest possible bundle. I don't mind indexing two bundles if you can't find a small bundle that has both types of DRS URIs. Document your process of finding the bundles to can.

There is only one bundle on dev that contains compact DRS URIs.

 curl 'https://service.dev.singlecell.gi.ucsc.edu/index/files?catalog=lm2&size=1000' | jq -c .hits[] | grep -P 'drs://dg\.\d'  | jq .bundles[].bundleUuid | sort | uniq -c
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1002k  100 1002k    0     0   349k      0  0:00:02  0:00:02 --:--:--  349k
     74 "1928ae54-dbba-33e6-9e40-6a9b3fe8585f"

I lack the permissions needed to can bundles from prod.

$ python3 scripts/can_bundle.py --source "tdr:bigquery:gcp:datarepo-72daf113:hca_prod_87f519b4886241f9acff75e823e0e430__20240301_dcp2_20260331_dcp59" --uuid "fbe4739e-12f1-4942-aacd-84af48199ab7" --version "2026-03-25T12:44:28.117000Z"
2026-05-28 20:16:51,910    INFO MainThread botocore.credentials: Found credentials in shared credentials file: ~/.aws/credentials
2026-05-28 20:16:51,939    INFO MainThread azul.deployment: Allocated new Boto3 client for 'secretsmanager' with ID 131396154115808
Traceback (most recent call last):
...
botocore.exceptions.ClientError: An error occurred (AccessDeniedException) when calling the GetSecretValue operation: User: arn:aws:sts::542754589326:assumed-role/viewer/nadove@ucsc.edu is not authorized to perform: secretsmanager:GetSecretValue on resource: dcp/azul/prod/google_service_account because no identity-based policy allows the secretsmanager:GetSecretValue action

I have manually removed most of the files from the canned bundle to reduce it to a more reasonable size.

@nadove-ucsc nadove-ucsc force-pushed the issues/nadove-ucsc/8027-curl-manifest-includes-external-lungmap-files branch from 5fd6082 to be1ff25 Compare May 29, 2026 04:08
@nadove-ucsc nadove-ucsc requested a review from hannes-ucsc May 29, 2026 06:12

@hannes-ucsc hannes-ucsc left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please can production bundles for the test. You observed that you lack the permissions to do so. This can be remedied easily. You should have permissions to read the metadata in BigQuery. If not, please ping me on Slack. Once you identified suitable bundle(s), ask the operator for help to actually can them. Don't forget to document your process/reasoning of identifying suitable bundles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3 reviews [process] Lead requested changes thrice

Projects

None yet

Development

Successfully merging this pull request may close these issues.

curl manifest futilely includes external LungMAP files

5 participants