Skip to content

How to decrease model inference time #22

Description

@richa-nvidia

Hi Team,

I am trying to use this for my application logs, by tweaking a bit of security-prompt.

Few Observations/query that I have:

I tried switching to 32B the performance was even slower
It takes a quite of lot time to run analysis, I used 100 chunks, and for this it took almost 5 mins and if we inc the file size to 5k the analysis will take almost 10 min
Is there a way to increase the inference ? so that analysis can be run more quickly
also I am running this on A100, 4gpu - any other machine configuration for which it will be more faster and accurate?
Any other suggestion that u guys have to make it work in more faster and optimized way.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions