Bug(s) in UniProt mapping()

Hello, I was trying to use the UniProt().mapping() function and I noticed two potential issues. 

### Unnecessary duplication of failed IDs

In this batching code:
```
                for x in tqdm.tqdm(range(size, total, size), disable=not progress):
                    link = self._get_next_link(self.services.last_response.headers)
                    batch = self.services.http_get(link, frmt="json")
                    batches += batch["results"]
                    fails += results.get("failedIds", [])  <---------
                return {"results": batches, "failedIds": fails}
```

First, `results` is from outside the loop, so the code is just adding the same "failedIds" values to fails over and over.
Second, it looks like the initial results return ALL failed IDs, not just the first 25 of them, so adding to `fails` inside the loop is unnecessary. 
* I tested in debug mode with sending 1000 Ensembl IDs, and the first `results = self.services.http_get()` call returned 106 failed IDs. The next call inside the loop (`batch = self.services.http_get(link, frmt="json")` returned the same 106 IDs. 

### Using the wrong URL to fetch results
I tried to send a set of 1000 Ensembl IDs (from "Ensembl", to "UniProtKB-Swiss-Prot"). That call stalled out after ~batch 25, ate up all my computer's memory, and killed the kernel. I've repeated this several times with the same results. Doing some investigating, it looks like using `idmapping/status/{job_id}` is returning a HUGE dictionary with every single piece of data for each matching protein from UniProt, including things like the protein sequence and lists of all alternative sequences and pieces of evidence. This turns out to be a massive amount of data and very quickly eats up all available memory. 

On the other hand, using `idmapping/results/{job_id}` returns just the ID, like `{'from': 'ENSG00000080815', 'to': 'P49768'}`, which is what I would expect from an ID mapping call.

Looking at the [UniProt website about the API](https://www.uniprot.org/help/id_mapping_prog), it looks like `/status` is intended to be used to check if the job is done and is **supposed** to return something like `{"jobStatus":"FINISHED", ...}`. In practice it does seem to be returning the first 25 results w/ all the available data in UniProt instead, so that might be an error on UniProt's end?

Either way... it seems like `idmapping/results/{job_id}` is supposed to be the URL used to retrieve results, and prevents the memory issues.

If you would like, I'd be happy to fork this repo and submit a PR with fixes. Otherwise I can leave it to your team, whichever is easiest for you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug(s) in UniProt mapping() #348

Unnecessary duplication of failed IDs

Using the wrong URL to fetch results

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Bug(s) in UniProt mapping() #348

Description

Unnecessary duplication of failed IDs

Using the wrong URL to fetch results

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions