UCAC 4 "Streak" processing

This page describes the process done by Skip Gaede to identify the “streak” stars present in some regions of the UCAC4 catalog, principally in a 30 degrees radius around the galactic center.

The streaks were identified during the UCAC4 data reduction and the stars were flagged in the catalog data in a field holding the object type. A value of 2 indicates a potential streak object. But this flag became insufficient for identifying all of the streaks, and all of the stars in a given streak, because the object type flag holds only the most severe error: a value of 2 was often overwritten by a higher value indicating, for example, a problematic proper motion solution. Early analysis of the catalog data by Patrick Chevalley revealed that the streak stars were contained in categories 2,5,8 and 9, but that the streak stars were only a small subset of the larger group. Proper removal of the streak stars required either a manual process or new software to automate the identification of the streaks. A hybrid approach was finally chosen: manual streak recognition, in combination with software to enforce uniform application of streak linearity and width criteria.

The first step was to create a reduced catalog consisting of category 2,5,8, and 9 stars. To further reduce the star density, only magnitude >= 12 were retained for categories 5 & 8, and, to more easily identify streaks, category 2 stars were made brighter by setting their magnitude to 10. One particularly dense streak region near M8 can be seen above. In skychart, regions containing anywhere from 1-16 streaks were marked by choosing two stars to define a containing bounding box.

These (approximately 1350) regions were then extracted from the UCAC4 catalog for further processing. The extraction program, u4test, was also modified to only extract stars of types 2,5,8 and 9. These sets were read into a program called TOPCAT where manual streak identification was performed. Each star had the UCAC identifier, coordinates, and object type available for analysis.

In TOPCAT, under higher magnification, streaks come in many “flavors:” some may appear to be quite random; some are extremely well-defined with a high star density; some display a branching structure. Examples of these are shown below:

Well defined streaks, like the one in the middle, are easily dealt with; streaks with a branching structure, like the one on the right, may need to be split into two segments; while the one on the left becomes much more streak-like with a change in scale of the horizontal axis:

One of the regions, identified in skychart as containing a streak, shows many aspects of the hybrid process.
This is how the data looks when brought into TOPCAT and plotted as a 2D scatter plot:

In this particular set, spotting the streak isn't easy. Marking the category 2 stars with a different color really helps.
This is shown in Figure 2:

The criteria for streaks is that the streak stars need to be collinear within a certain tolerance. Typically I have been using a criteria of +/- 0.0020 degrees from the streak axis. With high density streaks, streak edges are easily determined. With more diffuse streaks, it is easier to establish the streak axis and extent, and to let a software program, distPtLine, determine which stars lie within the established tolerance. TOPCAT provides a freeform selection tool which was used to define both the extent and axis of the streak. The true number of stars should be >= than the set chosen by “eyeball”.
Selected stars are shown in Figure 3 :

There are 13 stars in the set. Two stars at the ends of the streak are passed to a Perl script, which again extracts the stars contained in the bounding box and passes them to distPtLine which measures the distance of each star to the streak axis and segregates them into winners and losers. The results of the analysis can be read back into TOPCAT and overlaid on the original data.
The results of the analysis (19 stars) are shown in Figure 4:

The “eyeball” set is wholly contained in the analytic set.
In Figure 5, the “eyeball” set is in green, and the analytic set in gray:

Finally the apparent linearity of the streak is highly dependent on the scale used for the x-axis.
Here is Figure 6 adjusted to a more reasonable value, where the stars line “better:”

About 2000 streaks in the 1300 regions were processed, leading to the identification of 48,372 streak stars.

Here is the same region shown at the top of the page after streak removal:

The version 2 of the UCAC4 catalog for Skychart is available from the download page. It include this improvement and a correction to the proper motion of some stars.

If you are interested in learning more about this process, you can download a zip file with a spreadsheet summarizing the data, the perl script used to do the extraction and measuring for one streak, a batch file which could be used to re-create the streak results, and source code for the modified extraction and measuring software. You will also need to download the UCAC4 binary catalog, and the TOPCAT software.