Skip to content

FNNDSC/pfdicom_tagSub

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

87 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pfdicom_tagSub

https://travis-ci.org/FNNDSC/pfdicom_tagSub.svg?branch=master
  • pfdicom_tagSub reads/edits/saves DICOM meta information. It can be used to anonymize DICOM header data.

pfdicom_tagSub replaces a set of <tag, value> pairs in a DICOM header with values passed in a JSON structure. Individual DICOM tags can be explicitly referenced in the JSON structure, as well as a regular expression construct to capture all tags satisfying that expression (allowing for idiomatic bulk substitution of <tag, value> pairs).

Tag regular expression constructs are python string expressions and are prefixed by "re:<pythonRegex>". For example, "re:.*hysician" will perform some substitution on all tags that contain the letters hysician. The value substitution has access to a special lookup, #tag, which is the current tag hit. It is possible to apply built in functions to the tag hit, for example md5 hashing, using "%_md5|4_#tag",

{
    "re:.*hysician":                "%_md5|4_#tag"
}

will be expanded to

{
    "PerformingPhysiciansName" :    "%_md5|4_PerformingPhysiciansName"
    "PhysicianofRecord"        :    "%_md5|4_PhysicianofRecord"
    "ReferringPhysiciansName"  :    "%_md5|4_ReferringPhysiciansName"
    "RequestingPhysician"      :    "%_md5|4_RequestingPhysician"
}

The tag regular expression construct allows for simple and powerful bulk substition of <tag, value> pairs.

The script accepts an <inputDir>, and then from this point an os.walk() is performed to extract all the subdirs. Each subdir is examined for DICOM files (in the simplest sense by a file extension mapping) are passed to a processing method that reads and replaces specified DICOM tags, saving the result in a corresponding directory and filename in the output tree.

The following dependencies are installed on your host system/python3 virtual env (they will also be automatically installed if pulled from pypi):

  • pfmisc (various misc modules and classes for the pf* family of objects)
  • pftree (create a dictionary representation of a filesystem hierarchy)
  • pfdicom (handle underlying DICOM file reading)

The best method of installing this script and all of its dependencies is by fetching it from PyPI

pip3 install pfdicom_tagSub
[--inputDir <inputDir>
Input DICOM directory to examine. By default, the first file in this
directory is examined for its tag information. There is an implicit
assumption that each <inputDir> contains a single DICOM series.

[--inputFile <inputFile>]
An optional <inputFile> specified relative to the <inputDir>. If
specified, then do not perform a directory walk, but convert only
this file.

[--fileFilter <fileFilter>]
An optional extension to filter the DICOM files of interest from the
<inputDir>.

[--outputDir <outputDir>
The output root directory that will contain a tree structure identical
to the input directory, and each "leaf" node will contain the analysis
results.

[--outputLeafDir <outputLeafDirFormat>]
If specified, will apply the <outputLeafDirFormat> to the output
directories containing data. This is useful to blanket describe
final output directories with some descriptive text, such as
'anon' or 'preview'.

This is a formatting spec, so

    --outputLeafDir 'preview-%s'

where %%s is the original leaf directory node, will prefix each
final directory containing output with the text 'preview-' which
can be useful in describing some features of the output set.

[--tagFile <JSONtagFile>]
Parse the tags and their "subs" from a JSON formatted <JSONtagFile>.

[--tagStruct <JSONtagStructure>]
Parse the tags and their "subs" from a JSON formatted <JSONtagStucture>
string passed directly in the command line. Note that sometimes protecting
a JSON string can be tricky, especially when used in scripts or as variable
expansions. If the JSON string is problematic, use the [--tagInfo <string>]
instead.

[--tagInfo <delimited_parameters>]
A token delimited string that is reconstructed into a JSON structure by the
script. This is often useful if the [--tagStruict] JSON string is hard to
parse in scripts and variable passing within scripts. The format of this
string is:

        "<tag1><splitKeyValue><value1><split_token><tag2><splitKeyValue><value2>"

for example:

        --splitToken ","
        --splitKeyValue ':'
        --tagInfo "PatientName:anon,PatientID:%_md5|7_PatientID"

or more complexly (esp if the ':' is part of the key):

        --splitToken "++"
        --splitKeyValue "="
        --tagInfo "PatientBirthDate = %_strmsk|******01_PatientBirthDate ++
                   re:.*hysician"   = %_md5|4_#tag"


[--splitToken <split_token>]
The token on which to split the <delimited_parameters> string.
Default is '++'.

[--splitKeyValue <keyValueSplit>]
The token on which to split the <key> <value> pair. Default is ':'
but this can be problematic if the <key> itself has a ':' (for example
in the regular expression expansion).

[--outputFileStem <outputFileStem>]
The output file stem to store data. This should *not* have a file
extension, or rather, any "." chars. Dots in the name are considered
part of the stem and are *not* considered extensions.

[--removePrivateTags]
If specified, remove all the private tag elements from the input DICOMs

[--threads <numThreads>]
If specified, break the innermost analysis loop into <numThreads>
threads.

[--man]
Show full help.

[--synopsis]
Show brief help.

[--json]
If specified, output a JSON dump of final return.

[--followLinks]
If specified, follow symbolic links.

[--verbosity <level>]
Set the app verbosity level.

    0: No internal output;
    1: Run start / stop output notification;
    2: As with level '1' but with simpleProgress bar in 'pftree';
    3: As with level '2' but with list of input dirs/files in 'pftree';
    5: As with level '3' but with explicit file logging for
            - read
            - analyze
            - write

Perform a DICOM anonymization by processing specific tags:

pfdicom_tagSub                                      \
    --fileFilter dcm                                \
    --inputDir /var/www/html/normsmall              \
    --outputDir /var/www/html/anon                  \
    --tagStruct '
    {
        "PatientName":              "%_name|patientID_PatientName",
        "PatientID":                "%_md5|7_PatientID",
        "AccessionNumber":          "%_md5|8_AccessionNumber",
        "PatientBirthDate":         "%_strmsk|******01_PatientBirthDate",
        "re:.*hysician":            "%_md5|4_#tag",
        "re:.*stitution":           "#tag",
        "re:.*ddress":              "#tag"
    }
    ' --threads 0 --printElapsedTime

-- OR equivalently --

pfdicom_tagSub                                      \
    --fileFilter dcm                                \
    --inputDir /var/www/html/normsmall              \
    --outputDir /var/www/html/anon                  \
    --splitToken ","                                \
    --splitKeyValue "="                             \
    --tagInfo '
        PatientName         =  %_name|patientID_PatientName,
        PatientID           =  %_md5|7_PatientID,
        AccessionNumber     =  %_md5|8_AccessionNumber,
        PatientBirthDate    =  %_strmsk|******01_PatientBirthDate,
        re:.*hysician       =  %_md5|4_#tag,
        re:.*stitution      =  #tag,
        re:.*ddress         =  #tag
    ' --threads 0 --printElapsedTime

will replace the explicitly named tags as shown:

  • the PatientName value will be replaced with a Fake Name, seeded on the PatientID;
  • the PatientID value will be replaced with the first 7 characters of an md5 hash of the PatientID;
  • the AccessionNumber value will be replaced with the first 8 characters of an md5 hash of the AccessionNumber;
  • the PatientBirthDate value will set the final two characters, i.e. the day of birth, to 01 and preserve the other birthdate values;
  • any tags with the substring hysician will have their values replaced with the first 4 characters of the corresponding tag value md5 hash;
  • any tags with stitution and ddress substrings in the tag contents will have the corresponding value simply set to the tag name.

Spelling matters! Especially with the substring bulk replace, please make sure that the substring has no typos, otherwise the target tags will most probably not be processed.

_-30-_

About

A DICOM tag substitution utility. Essentially allows for DICOM anonymization.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors