Example: Cross-matching two catalogs
Here is the situation: You have two catalogs A and B, and you want to find the objects in A that are also in B. However, the coordinates in the two catalogs are not exactly the same, so you need to allow for some tolerance in the matching.
First, let’s create two mock catalogs A and B:
import numpy as np
# Create two mock catalogs as numpy arrays
catalogA = np.array([[80.894, 41.269], [120.689, -41.269], [10.689, -41.269]])
catalogB = np.array([[10.688, -41.270], [10.689, -41.270], [10.690, -41.269], [120.690, -41.270]])
Note
If you want to use a format other than a numpy array, see the supported formats for more information.
xmatch()
Then, we can perform the cross-matching with the tolerance of 0.01 degree using the pycorrelator.xmatch()
function.
from pycorrelator import xmatch
result_object = xmatch(catalogA, catalogB, tolerance=0.01)
The result object contains the matching results. Three methods are available to get the results in different formats:
get_dataframe1()
To get the matching results of catalog A, use the pycorrelator.XMatchResult.get_dataframe1()
method:
print(result_object.get_dataframe1())
Expected output:
Ra Dec N_match
0 80.894 41.269 0
1 120.689 -41.269 1
2 10.689 -41.269 3
Here, the column N_match
indicates the number of matches found in catalog B for each object in catalog A.
To find the objects in catalog A that are also in catalog B, set the min_match
parameter to 1:
print(result_object.get_dataframe1(min_match=1))
Expected output:
Ra Dec N_match
1 120.689 -41.269 1
2 10.689 -41.269 3
The method pycorrelator.XMatchResult.get_dataframe1()
returns a pandas DataFrame object.
So if you want to find the objects in catalog A that are not in catalog B, you can do the following
pandas DataFrame operation:
df1 = result_object.get_dataframe1()
print(df1[df1['N_match'] == 0])
Expected output:
Ra Dec N_match
0 80.894 41.269 0
get_dataframe2()
Similarly, to get the matching results of catalog B, use the pycorrelator.XMatchResult.get_dataframe2()
method.
The usage is the same as pycorrelator.XMatchResult.get_dataframe1()
. Just instead of giving the matching results
of each object in catalog A, it gives the matching results of each object in catalog B.
print(result_object.get_dataframe2())
Expected output:
Ra Dec N_match
0 10.688 -41.270 1
1 10.689 -41.270 1
2 10.690 -41.269 1
3 120.690 -41.270 1
get_serial_dataframe()
If you want to get the matching results of both catalogs in a single DataFrame, you can use the
pycorrelator.XMatchResult.get_serial_dataframe()
method. For example:
print(result_object.get_serial_dataframe(min_match=0))
Expected output:
Ra Dec N_match is_cat1
0 80.894 41.269 0 True
1 120.689 -41.269 1 True
3 120.690 -41.270 -1 False
2 10.689 -41.269 3 True
0 10.688 -41.270 -1 False
1 10.689 -41.270 -1 False
2 10.690 -41.269 -1 False
Here, the column is_cat1
indicates whether the object is from catalog A (True) or catalog B (False).
And the column N_match
indicates the number of matches found in catalog B for each object in catalog A.
Each object in catalog A is shown in order as in the input catalog, followed by the matching results of the objects in catalog B.
This means that if an object in catalog B is matches with multiple objects in catalog A, it will be shown multiple times.
And if an object in catalog B is not matched with any object in catalog A, it will not be shown in the output.
Note
The N_match
value is -1 for all objects in catalog B. This is designed for efficiency reasons.
Furthermore, if you want to make catalog B as the ‘primary’ catalog, you can set the reverse
parameter to True
:
print(result_object.get_serial_dataframe(min_match=0, reverse=True))
Expected output:
Ra Dec N_match is_cat1
0 10.688 -41.270 1 False
2 10.689 -41.269 -1 True
1 10.689 -41.270 1 False
2 10.689 -41.269 -1 True
2 10.690 -41.269 1 False
2 10.689 -41.269 -1 True
3 120.690 -41.270 1 False
1 120.689 -41.269 -1 True
Here we can see that the third object (index of 2) in catalog A shown 3 times in the output, because it has 3 matches in catalog B. And the first object (index of 0) in catalog A is not shown in the output, because it has no matches in catalog B.