Import Application

Purpose

The main purpose of the import application is to import arrayImages into TMAJ after they have been scanned by an ArraySlide Scanning machine. 
A secondary purpose (and not widely used) is to do a mass-import of specimens from a tab-delimited text file.

Background

Array Slide Scanning Machines

The purpose of the ArraySlide Scanning machine is to scan an ArraySlide and convert every spot on the arraySlide into a jpeg image.  
For the machine to be compatible with TMAJ, it must be able to do 2 things:
  1. Produce an invididual image for every spot on the ArraySlide.  So a 20 x 20 array should have 400 images.
  2. Have the coordinates of the spot in the filename.  For example, for an ArraySlide, the machine should generate the following files that look something like the following:
    01_01_spot.jpg  (x=1 y=1)
    01_02_spot.jpg  (x=1 y=2)
    01_03_spot.jpg  (x=1 y=3)
    01_04_spot.jpg  (x=1 y=4)
    ...
    20_18_spot.jpg  (x=20 y=18)
    20_19_spot.jpg  (x=20 y=19)
    20_20_spot.jpg  (x=20 y=20)

    Note: The ARIOL does not put the coordinates in the filename, but names it files something like 111555101.jpg, and then has an xml file where 111555101.jpg is mapped to some coordinate, like x=5 y=18.  This is also acceptable.  If this is the case you will have to use the ARIOL File Renamer tab before running the "Import ArrayImages" tab. 
There are a number ArraySlide Scanning machines on the market.  Here is a List of Compatible ArraySlide Scanning Machines.

The BLISS machine scans an ArraySlide.  Notice the ArraySlide in the microscope.



Importing an ArrayImage


How To Import ArrayImages


Note that ArrayImages are not actually stored in the database, but rather only a link--that is the directory-name and the file-name.

  1. Scan the image(s) with a scanning machine or a microscope.  The images should come out as a standard image format such as jpeg or gif.
  2. If necessary, manually name the images files.  Note: You can skip this step if you are using a recognized scanning machine such as the Bliss, Acis, Aperio, or Aperio.  If you don't have a recognized scanning machine, you will have to name each image file manually (or write a short script.)
  3. Name the directory properly.  This is imperative as the information from the directory is read so that the program knows what ArraySlide and ArrayBlock is being imported. 
  4. Open the Import application.  Logon to TMAJ and open the Import application. 
  5. Ariol Pre-Import Tab. This step is only required if you are using the ARIOL.  Follow the instructions here: Ariol Pre-Import Instructions. If using any other Scanning machine, skip this step.
  6. ArrayImages TabClick the Add button to add the directories you wish to import.  Next click the start button. 
    Troubleshooting if this step fails:
    • Did you name the directory properly? 
    • If you are using the ARIOL machine, did you use the Ariol Pre-Import tab?
  7. Compress the images.  While this step is optionally, you will not be able to use thumbnails or see compressed-small images if you bypass it.  If you are using the ARIOL machine, you must have performed the ARIOL pre-import for this step to work.
  8. Place the directory in the images directory.   This is the final place for all the jpeg images in TMAJ.  If you do not know this directory, contact the TMAJ administrator.  You may place move the directory you are imported to the images directory at any time, however, all the former steps must be performed on the imported directory.  [More details on the tmaj images directory]



ArrayImages tab

Naming Format for Directories

Each directory represents one scan of one ArraySlide.
The directory that contains the images from an ArraySlide must be named in this specific format:

Format:   TMA_{ArrayBlockID}_CUT_{Z}_SCAN_{Scan-Number}_{Stain}_{Bar-Code}_{Scanning-Machine}_{Short-Description}
Example: TMA_0025_CUT_021_SCAN_01_HE_05011231_BLISS_highRes

On Lengths of Variables: Also, ArrayBlockID, Z, and ScanNumber are assigned a certain number of digits, as shown in the length argument in the table below.  So ArrayBlockID of 17 would be assigned 0017.  While this step is completely optional, it helps a lot when one acquires many directories.  This is because operating system sort the directories  in alphabetical order, and it makes it much simplier to locate a directory for a  particular arrayslide.

Directories must be unique: No two directories may have the same ArrayBlockID,Z, and ScanNumber.  Thus, if you are rescanning a slide (which will have a fixed ArrayBlockID and Z), and you want to keep the old scan, simply have "02" be the ScanNumber.  If you scan it again, "03" will be the scan number.  If each slide is scanned only once, this is never a problem.


Variables in the Directory Name
Variable Name
Description
Length
Example
ArrayBlockID The arrayBlockID of the scanned arraySlide 4
0025
Z
The cut number of the arraySlide 3
021
ScanNumber Generally, an arraySlide is scanned only once, so this is almost always "01".  Sometimes a slide may be rescanned, because for example the first scan went bad, or the end-user wants to scan at a higher resolution.  Subsequent scans of the same arraySlide would get a "02","03", and so on for this value. 2
01
Stain Either the antibody name the ArraySlide has been stained with (e.g. KI67), or H-and-E.  The ArraySlide is always stained prior to scanning, otherwise the tissue would be clear. 1+
HE
Bar-Code Some scanning-machines assign a bar-code to every scan.  This is not necessary for TMAJ, but encouraged.
1+
159401
Scanning-Machine Either STANDARD, BLISS, ACIS, ARIOL, or NIKON.  If you manually named your image files, select STANDARD. 1+
BLISS
Short-Description A short description of the scan that was done.  This field is optional, but is HIGHLY SUGGESTED if you have 2 or more scans for the same arrayslide.  If there are 2 or more scans, the short description should include something about why 2 scans of the same arrayslide were needed.  For example, one short description may be lowRes and the other may be highRes.  In this case, the end user could tell that 2 scans were done to see how the scanning machine did at different resolutions. 
any
highRes

Naming Format for Image Files

If you are using a recognized scanning machine, the filenames will already be in the proper format.    TMAJ will automatically know which format to use because you specify the scanning-machine when you name the directory (shown above). 

Manually Naming your Files

If you don't have a scanning machine that TMAJ recognizes, you will need to name the files yourself.  (You might want to write a short script to name them programatically.)
Here is how you manually name your files:
You must go through every image file in the directory and name them in this format:
 y_{y-coordinate}_x_{x-coordinate}_{any-extra-information-you-want}.jpg

You should make all your coordinates 2 characters long (i.e. "05" instead of just "5") because the operating system for sort the files in alphabetical order.

Examples for an array with a width of 5 and a height of 20:
y_01_x_01_spot.jpg
y_01_x_02_spot.jpg
y_01_x_03_spot.jpg
y_01_x_04_spot.jpg
y_01_x_05_spot.jpg
y_02_x_01_spot.jpg
y_02_x_02_spot.jpg
y_02_x_03_spot.jpg
y_02_x_04_spot.jpg
y_02_x_05_spot.jpg
y_03_x_01_spot.jpg
y_03_x_02_spot.jpg
y_03_x_03_spot.jpg
y_03_x_04_spot.jpg
.
.
.
y_20_x_03_spot.jpg
y_20_x_04_spot.jpg
y_20_x_05_spot.jpg


Lastly, make sure you select STANDARD as your scanning-machine when you name the directory (see section on naming directory).

More Notes on Filenames

Each scanning machine has a different format.  If you are using the ACIS, the pre-import tab will take care of this.  This is just for notes are how files are imported.  The files must be named in a specific format so that TMAJ can pull out the X and Y coordinates.  This is done automatically be the machine that scans the ArraySlide.

Images for each spot are output as jpegs and given a filename based upon their coordinate.  For example, spot (x=5,y=11) would be output as:  0_0_5_11_0_0_0.jpg   (Only the 3rd and 4th position are significant). 

It should noted that different scanning machines use different outputs.
For example, the BLISS machine may use something like:
    0_0_5_11_0_0_0.jpg
whereas the ACIS machine may use:
    07261614_1_31_2005_A y_11 x_5.jpg
The import program will know which machine has been used to scan the images since the machine name is contained in the directory name, as noted above.  Inside TMAJ, there is code for parsing the X and Y coordinates for each of the scanning machines.

File Naming on the ARIOL Machine

On most scanning machines, the jpeg filenames already have the x and y coordinates in them.  However this is not the case with the ARIOL machine.  The ARIOL machine's output is:
To put x and y coordinates in the image-filesnames, one must first go to the 'Ariol Pre-Import' tab.  The program will read the xml file and put the appropriate x and y coordinate information onto the jpeg filesnames.  The directory will then be ready to be imported just like any other scan.  On other scanning machines this step is skipped.


TMAJ Images Directory

This is where you put all the image directories.   Each of the subdirectories in the TMAJ Images directory contains all the jpeg images from a particular scan.
The WEBSERVER_URL in client.properties points to this directory.  This directory will be accessible through the web, as images are accessed through a URL.

Example URLS:
http://www.myserver.com/tmaj/images/TMA_0017_CUT_020_SCAN_01_HE_0_BLISS/05_02.jpg
http://www.myserver.com/tmaj/images/TMA_0018_CUT_021_SCAN_01_RACEMASE_0_BLISS/05_03.jpg

In this case, the URL for the tmaj image directory would be:
http://www.myserver.com/tmaj/images

This would be mapped somwhere on the server like:
C:\Program Files\Apache Group\Apache2\htdocs\tmaj\images

For this example, in the httpd.conf file under the Apache Websever installation, you would have DocumentRoot set to:
C:\Program Files\Apache Group\Apache2\htdocs\

The person doing the import will copy the entire scan directory to this directory.  Since we have named the directories accordingly, they are easy to sort, and you should see something like this in the images directory.  Notice the operating system will sort the names alphabetically, so each directory is ordered first by ArrayBlock, then Z, then the ScanNumber.

MachineScores Tab


Machine Scored Data refers to scores that a machine automatically assigns to an ArrayImage.  (This is in contrast to scores that users assign to images.) Machine Scored Data used to be called Auto Scored Data.   For example, the Chromavision machine might assign the following scores to one ArrrayImage:
BrownArea: 105
BlueArea: 1304
BrownIntensity: 75

Importing MachineScores into the database is optional.  They simply give the user more information about the ArrayImage.  The MachineScores for an ArrayImage can be viewed by clicking the "Info button" on the ArrayImage Panel, or by clicking the "Show Machine Scores Table" on the ArraySlide panel.  Both of the panels are viewed in the Images application after one opens a session.

The machine scored data for an ArraySlide is put into a tab-delimited text file by the Scanning Machine.  Before the file can be imported, it must be renamed properly.  The naming format used is the same as the naming format used for directories of ArrayImages (ArraySlides).  An example machine-scores file may look something like:
    TMA_0017_CUT_060_SCAN_01_HE_01251516_ACIS_Threshold5.txt
The only difference is the 'extra information' section gives a description of the MachineScores, in this case 'Threshold5'.

More details on MachineScores: MachineSessions

A MachineSession represents one Machine-Scoring of an ArraySlide scan.  The entire .txt file represents a MachineSession. It should be noted that it is possible for an ArraySlide Scan to have multiple MachineSessions.  So for example, ArrayImage#123 would end up having several values for BrownArea.
BrownArea: 105  [MachineSessionID#5]
BrownArea: 271 [MachineSessionID#6]

The 'short description' section in the filename should give adequate detail in describing each MachineSession if there is more than one MachineSession per scan.
For the case of MachineSessions, the short description will differentiate among the machine sessions for the scans.
For example:
    TMA_0017_CUT_060_SCAN_01_HE_01251516_ACIS_Threshold5.txt
    TMA_0017_CUT_060_SCAN_01_HE_01251516_ACIS_Threshold180.txt
In the above example, the short description shows that each machine session uses a different threshold.


Specimens Tab


Specimens can be imported into the database by importing a tab-delimited file.  Each row in the file represents one specimens.  Currently this feature is only supporting at JHH.  This tab-delimited file is obtained from PDS (Pathology Data Systems.).  This file can be imported as-is in the import application:
1. Launch the import application
2. Choose the Specimens tab.
3. Select the Hospital where the specimens are located.
4. Choose the PDS tab-delimited file. 
5. Click the start button.


Ariol Pre-Import Tab


Jpeg files that are going to be imported into TMAJ must have and x and y coordinate in their filename designating where in the array they are positioned.  For example, a valid jpeg file may be:
Y_06_x_11_spot.jpg
This file would represent spot x=11, y=6.

The ARIOL machine does not have x and y coordinates in its filenames initially.  Rather, it has a filename called export.xml which the Ariol Pre-Import application uses to get the x and y coordinates.  The pre-import application then renames all the jpeg filenames to include an x and y coordinate.

To use the Ariol Pre-Import:
  1. Log on to the import application and go the ARIOL Pre-Import tab
  2. Choose the directory in which the jpeg images will be renamed.
  3. Click the Start button.


Ariol PreImport Tab.  The Ariol PreImport is used on a directory.




<< Back to the Manual

© Copyright 2006 | All Rights Reserved | The Johns Hopkins University