(1 vs. 8) InterOpNov2021KD < IVOA

Revision 82023-04-26 - RaffaeleDAbrusco

META TOPICPARENT	name="InterOpNov2021"

Knowledge Discovery 1

Time: Wednesday Nov 03 22:00 UTC

Changed:

<
<

Raffaele D'Abrusco	Introduction	5'	pdf
Ashish Mahabal	Data Sheets and Model Cards Astronomy datasets have been growing, and so are the attempts to use them wth a variety of machine learning techniques. While we would like to use all data, data fusion for diverse uneven or not-fully-matched datasets can be a challenge. Creating machine learning and artificial intelligence models for such datasets and follow-up validation can be challenging owing to lack of large labeled training datasets. To address this two related concepts that have emerged recently in data science are that of Data Sheets for data sets (Gebru et al. arXiv:1803.09010), and model cards for models (Mitchell et al., arXiv:1810.03993). This is just like each component in the electronics industry comes with a datasheet that describes operating characteristics, test results, recommended use etc., We recommend that for each astronomy dataset uniform and standardized datasheets that advertise similar meta-properties should be created, not just stating what “is” but also where each of the dataset could go, much like lego-blocks. This will enable data fusion, and also thwart mis-guided use of datasets. Similarly the models that we build will carry not just the usual provenance, but explicit characteristics displaying known biases and hence added caution when being used in certain ways. While this trend started in social fields where bias is explicit, it has been successfully applied in the Planetary Data System (PDS) setup for identifying key descriptors in an equally diverse dataspace (https://pds.nasa.gov/datastandards/documents/im/v1/index_1G00.html#10.31%C2%A0%C2%A0class_pds_observation_area).	7'	pdf

>
>

Raffaele D'Abrusco	Introduction	5'	pdf
Ashish Mahabal	Data Sheets and Model Cards Astronomy datasets have been growing, and so are the attempts to use them wth a variety of machine learning techniques. While we would like to use all data, data fusion for diverse uneven or not-fully-matched datasets can be a challenge. Creating machine learning and artificial intelligence models for such datasets and follow-up validation can be challenging owing to lack of large labeled training datasets. To address this two related concepts that have emerged recently in data science are that of Data Sheets for data sets (Gebru et al. arXiv:1803.09010), and model cards for models (Mitchell et al., arXiv:1810.03993). This is just like each component in the electronics industry comes with a datasheet that describes operating characteristics, test results, recommended use etc., We recommend that for each astronomy dataset uniform and standardized datasheets that advertise similar meta-properties should be created, not just stating what “is” but also where each of the dataset could go, much like lego-blocks. This will enable data fusion, and also thwart mis-guided use of datasets. Similarly the models that we build will carry not just the usual provenance, but explicit characteristics displaying known biases and hence added caution when being used in certain ways. While this trend started in social fields where bias is explicit, it has been successfully applied in the Planetary Data System (PDS) setup for identifying key descriptors in an equally diverse dataspace (https://pds.nasa.gov/datastandards/documents/im/v1/index_1G00.html#10.31%C2%A0%C2%A0class_pds_observation_area).	7'	pdf

Petr Skoda	SDSS redshift prediction based on Bayesian Deep Learning Bayesian deep learning is a relatively new approach that starts to enter the astronomy. Unlike majority of the current methods it does provide the uncertainty of its predictions. So we can visually check the suspicious cases with high uncertainty. We demonstrate this in the experiment with spectroscopic redshift prediction from SDSS quasar catalogues .This allowed us to find a number of quasars which are probably normal stars with wrong estimate of redshift from the SDSS pipeline.	7'	pdf
Rafael Martinez Galarza	Harvesting outliers: data barriers to turn anomalies into discoveries Over the last few years astronomers have become increasingly effective at identifying anomalous objects in large astronomical datasets. So far, that has meant "finding objects in sparsely populated regions of a multidimensional feature space". This is done using a number of methods that includes ensemble methods such as random forests searches, and more recently generative models that identify anomalies as those objects are more difficult to reconstruct by the trained model. This has produced huge lists of anomalies in diverse datasets that include SDSS galaxy spectra, Kepler and TESS light curves, and X-ray catalogs. Yet, most of those anomalies are not followed up, because of a cultural difficulty for scientists to interpret multi-dimensional scatter plots that have no labels in their axes. We argue that such cultural barrier can be overcome with novel ways to combine domain knowledge expertise with data visualization, or even incorporating domain knowledge directly into the anomaly detection algorithms. We would like to discuss ways in which VO tools can help in the identification of anomalies that represent true astronomical discoveries, by harvesting the currently publicly available catalogs of anomalies.	7'	pdf

Changed:

<
<

>
>

Moderator: Raffaele D'Abrusco, Notetaker: TBD, Etherpad link

Deleted:

<
<

Moderator: Raffaele D'Abrusco, Notetaker: TBD, Etherpad link

META FILEATTACHMENT	attachment="KD-IG_anomalies.pdf" attr="" comment="" date="1635945677" name="KD-IG_anomalies.pdf" path="KD-IG_anomalies.pdf" size="35575806" user="RaffaeleDAbrusco" version="1"
META FILEATTACHMENT	attachment="skoda-bayesian-redshift.pdf" attr="" comment="" date="1635970467" name="skoda-bayesian-redshift.pdf" path="skoda-bayesian-redshift.pdf" size="1855828" user="RaffaeleDAbrusco" version="1"
META FILEATTACHMENT	attachment="Mahabal_IVOA_20211103.pdf" attr="" comment="" date="1635970664" name="Mahabal_IVOA_20211103.pdf" path="Mahabal_IVOA_20211103.pdf" size="60610" user="RaffaeleDAbrusco" version="1"
META FILEATTACHMENT	attachment="slides_session_KDIG_IVOA_2021FallInterOp.pdf" attr="" comment="" date="1636049545" name="slides_session_KDIG_IVOA_2021FallInterOp.pdf" path="slides_session_KDIG_IVOA_2021FallInterOp.pdf" size="197990" user="RaffaeleDAbrusco" version="1"

Revision 72021-11-04 - RaffaeleDAbrusco

META TOPICPARENT	name="InterOpNov2021"

Knowledge Discovery 1

Time: Wednesday Nov 03 22:00 UTC

Changed:

<
<

Raffaele D'Abrusco	Introduction	5'	pdf
Ashish Mahabal	Data Sheets and Model Cards Astronomy datasets have been growing, and so are the attempts to use them wth a variety of machine learning techniques. While we would like to use all data, data fusion for diverse uneven or not-fully-matched datasets can be a challenge. Creating machine learning and artificial intelligence models for such datasets and follow-up validation can be challenging owing to lack of large labeled training datasets. To address this two related concepts that have emerged recently in data science are that of Data Sheets for data sets (Gebru et al. arXiv:1803.09010), and model cards for models (Mitchell et al., arXiv:1810.03993). This is just like each component in the electronics industry comes with a datasheet that describes operating characteristics, test results, recommended use etc., We recommend that for each astronomy dataset uniform and standardized datasheets that advertise similar meta-properties should be created, not just stating what “is” but also where each of the dataset could go, much like lego-blocks. This will enable data fusion, and also thwart mis-guided use of datasets. Similarly the models that we build will carry not just the usual provenance, but explicit characteristics displaying known biases and hence added caution when being used in certain ways. While this trend started in social fields where bias is explicit, it has been successfully applied in the Planetary Data System (PDS) setup for identifying key descriptors in an equally diverse dataspace (https://pds.nasa.gov/datastandards/documents/im/v1/index_1G00.html#10.31%C2%A0%C2%A0class_pds_observation_area).	7'	pdf

>
>

Raffaele D'Abrusco	Introduction	5'	pdf
Ashish Mahabal	Data Sheets and Model Cards Astronomy datasets have been growing, and so are the attempts to use them wth a variety of machine learning techniques. While we would like to use all data, data fusion for diverse uneven or not-fully-matched datasets can be a challenge. Creating machine learning and artificial intelligence models for such datasets and follow-up validation can be challenging owing to lack of large labeled training datasets. To address this two related concepts that have emerged recently in data science are that of Data Sheets for data sets (Gebru et al. arXiv:1803.09010), and model cards for models (Mitchell et al., arXiv:1810.03993). This is just like each component in the electronics industry comes with a datasheet that describes operating characteristics, test results, recommended use etc., We recommend that for each astronomy dataset uniform and standardized datasheets that advertise similar meta-properties should be created, not just stating what “is” but also where each of the dataset could go, much like lego-blocks. This will enable data fusion, and also thwart mis-guided use of datasets. Similarly the models that we build will carry not just the usual provenance, but explicit characteristics displaying known biases and hence added caution when being used in certain ways. While this trend started in social fields where bias is explicit, it has been successfully applied in the Planetary Data System (PDS) setup for identifying key descriptors in an equally diverse dataspace (https://pds.nasa.gov/datastandards/documents/im/v1/index_1G00.html#10.31%C2%A0%C2%A0class_pds_observation_area).	7'	pdf

Petr Skoda	SDSS redshift prediction based on Bayesian Deep Learning Bayesian deep learning is a relatively new approach that starts to enter the astronomy. Unlike majority of the current methods it does provide the uncertainty of its predictions. So we can visually check the suspicious cases with high uncertainty. We demonstrate this in the experiment with spectroscopic redshift prediction from SDSS quasar catalogues .This allowed us to find a number of quasars which are probably normal stars with wrong estimate of redshift from the SDSS pipeline.	7'	pdf
Rafael Martinez Galarza	Harvesting outliers: data barriers to turn anomalies into discoveries Over the last few years astronomers have become increasingly effective at identifying anomalous objects in large astronomical datasets. So far, that has meant "finding objects in sparsely populated regions of a multidimensional feature space". This is done using a number of methods that includes ensemble methods such as random forests searches, and more recently generative models that identify anomalies as those objects are more difficult to reconstruct by the trained model. This has produced huge lists of anomalies in diverse datasets that include SDSS galaxy spectra, Kepler and TESS light curves, and X-ray catalogs. Yet, most of those anomalies are not followed up, because of a cultural difficulty for scientists to interpret multi-dimensional scatter plots that have no labels in their axes. We argue that such cultural barrier can be overcome with novel ways to combine domain knowledge expertise with data visualization, or even incorporating domain knowledge directly into the anomaly detection algorithms. We would like to discuss ways in which VO tools can help in the identification of anomalies that represent true astronomical discoveries, by harvesting the currently publicly available catalogs of anomalies.	7'	pdf

Moderator: Raffaele D'Abrusco, Notetaker: TBD, Etherpad link

META FILEATTACHMENT	attachment="KD-IG_anomalies.pdf" attr="" comment="" date="1635945677" name="KD-IG_anomalies.pdf" path="KD-IG_anomalies.pdf" size="35575806" user="RaffaeleDAbrusco" version="1"
META FILEATTACHMENT	attachment="skoda-bayesian-redshift.pdf" attr="" comment="" date="1635970467" name="skoda-bayesian-redshift.pdf" path="skoda-bayesian-redshift.pdf" size="1855828" user="RaffaeleDAbrusco" version="1"
META FILEATTACHMENT	attachment="Mahabal_IVOA_20211103.pdf" attr="" comment="" date="1635970664" name="Mahabal_IVOA_20211103.pdf" path="Mahabal_IVOA_20211103.pdf" size="60610" user="RaffaeleDAbrusco" version="1"

Added:

>
>

META FILEATTACHMENT	attachment="slides_session_KDIG_IVOA_2021FallInterOp.pdf" attr="" comment="" date="1636049545" name="slides_session_KDIG_IVOA_2021FallInterOp.pdf" path="slides_session_KDIG_IVOA_2021FallInterOp.pdf" size="197990" user="RaffaeleDAbrusco" version="1"

Revision 62021-11-03 - RaffaeleDAbrusco

META TOPICPARENT	name="InterOpNov2021"

Knowledge Discovery 1

Time: Wednesday Nov 03 22:00 UTC

Raffaele D'Abrusco

Introduction

5'

pdf

Changed:

<
<

Ashish Mahabal	Data Sheets and Model Cards Astronomy datasets have been growing, and so are the attempts to use them wth a variety of machine learning techniques. While we would like to use all data, data fusion for diverse uneven or not-fully-matched datasets can be a challenge. Creating machine learning and artificial intelligence models for such datasets and follow-up validation can be challenging owing to lack of large labeled training datasets. To address this two related concepts that have emerged recently in data science are that of Data Sheets for data sets (Gebru et al. arXiv:1803.09010), and model cards for models (Mitchell et al., arXiv:1810.03993). This is just like each component in the electronics industry comes with a datasheet that describes operating characteristics, test results, recommended use etc., We recommend that for each astronomy dataset uniform and standardized datasheets that advertise similar meta-properties should be created, not just stating what “is” but also where each of the dataset could go, much like lego-blocks. This will enable data fusion, and also thwart mis-guided use of datasets. Similarly the models that we build will carry not just the usual provenance, but explicit characteristics displaying known biases and hence added caution when being used in certain ways. While this trend started in social fields where bias is explicit, it has been successfully applied in the Planetary Data System (PDS) setup for identifying key descriptors in an equally diverse dataspace (https://pds.nasa.gov/datastandards/documents/im/v1/index_1G00.html#10.31%C2%A0%C2%A0class_pds_observation_area).	7'	pdf
Petr Skoda	SDSS redshift prediction based on Bayesian Deep Learning Bayesian deep learning is a relatively new approach that starts to enter the astronomy. Unlike majority of the current methods it does provide the uncertainty of its predictions. So we can visually check the suspicious cases with high uncertainty. We demonstrate this in the experiment with spectroscopic redshift prediction from SDSS quasar catalogues .This allowed us to find a number of quasars which are probably normal stars with wrong estimate of redshift from the SDSS pipeline.	7'	pdf

>
>

Ashish Mahabal	Data Sheets and Model Cards Astronomy datasets have been growing, and so are the attempts to use them wth a variety of machine learning techniques. While we would like to use all data, data fusion for diverse uneven or not-fully-matched datasets can be a challenge. Creating machine learning and artificial intelligence models for such datasets and follow-up validation can be challenging owing to lack of large labeled training datasets. To address this two related concepts that have emerged recently in data science are that of Data Sheets for data sets (Gebru et al. arXiv:1803.09010), and model cards for models (Mitchell et al., arXiv:1810.03993). This is just like each component in the electronics industry comes with a datasheet that describes operating characteristics, test results, recommended use etc., We recommend that for each astronomy dataset uniform and standardized datasheets that advertise similar meta-properties should be created, not just stating what “is” but also where each of the dataset could go, much like lego-blocks. This will enable data fusion, and also thwart mis-guided use of datasets. Similarly the models that we build will carry not just the usual provenance, but explicit characteristics displaying known biases and hence added caution when being used in certain ways. While this trend started in social fields where bias is explicit, it has been successfully applied in the Planetary Data System (PDS) setup for identifying key descriptors in an equally diverse dataspace (https://pds.nasa.gov/datastandards/documents/im/v1/index_1G00.html#10.31%C2%A0%C2%A0class_pds_observation_area).	7'	pdf
Petr Skoda	SDSS redshift prediction based on Bayesian Deep Learning Bayesian deep learning is a relatively new approach that starts to enter the astronomy. Unlike majority of the current methods it does provide the uncertainty of its predictions. So we can visually check the suspicious cases with high uncertainty. We demonstrate this in the experiment with spectroscopic redshift prediction from SDSS quasar catalogues .This allowed us to find a number of quasars which are probably normal stars with wrong estimate of redshift from the SDSS pipeline.	7'	pdf

Rafael Martinez Galarza

Harvesting outliers: data barriers to turn anomalies into discoveries

Over the last few years astronomers have become increasingly effective at identifying anomalous objects in large astronomical datasets. So far, that has meant "finding objects in sparsely populated regions of a multidimensional feature space". This is done using a number of methods that includes ensemble methods such as random forests searches, and more recently generative models that identify anomalies as those objects are more difficult to reconstruct by the trained model. This has produced huge lists of anomalies in diverse datasets that include SDSS galaxy spectra, Kepler and TESS light curves, and X-ray catalogs. Yet, most of those anomalies are not followed up, because of a cultural difficulty for scientists to interpret multi-dimensional scatter plots that have no labels in their axes. We argue that such cultural barrier can be overcome with novel ways to combine domain knowledge expertise with data visualization, or even incorporating domain knowledge directly into the anomaly detection algorithms. We would like to discuss ways in which VO tools can help in the identification of anomalies that represent true astronomical discoveries, by harvesting the currently publicly available catalogs of anomalies.

7'

pdf

Moderator: Raffaele D'Abrusco, Notetaker: TBD, Etherpad link

META FILEATTACHMENT	attachment="KD-IG_anomalies.pdf" attr="" comment="" date="1635945677" name="KD-IG_anomalies.pdf" path="KD-IG_anomalies.pdf" size="35575806" user="RaffaeleDAbrusco" version="1"

Added:

>
>

META FILEATTACHMENT	attachment="skoda-bayesian-redshift.pdf" attr="" comment="" date="1635970467" name="skoda-bayesian-redshift.pdf" path="skoda-bayesian-redshift.pdf" size="1855828" user="RaffaeleDAbrusco" version="1"
META FILEATTACHMENT	attachment="Mahabal_IVOA_20211103.pdf" attr="" comment="" date="1635970664" name="Mahabal_IVOA_20211103.pdf" path="Mahabal_IVOA_20211103.pdf" size="60610" user="RaffaeleDAbrusco" version="1"

Revision 52021-11-03 - RaffaeleDAbrusco

META TOPICPARENT	name="InterOpNov2021"

Knowledge Discovery 1

Time: Wednesday Nov 03 22:00 UTC

Changed:

<
<

Ashish Mahabal	Data Sheets and Model Cards Astronomy datasets have been growing, and so are the attempts to use them wth a variety of machine learning techniques. While we would like to use all data, data fusion for diverse uneven or not-fully-matched datasets can be a challenge. Creating machine learning and artificial intelligence models for such datasets and follow-up validation can be challenging owing to lack of large labeled training datasets. To address this two related concepts that have emerged recently in data science are that of Data Sheets for data sets (Gebru et al. arXiv:1803.09010), and model cards for models (Mitchell et al., arXiv:1810.03993). This is just like each component in the electronics industry comes with a datasheet that describes operating characteristics, test results, recommended use etc., We recommend that for each astronomy dataset uniform and standardized datasheets that advertise similar meta-properties should be created, not just stating what “is” but also where each of the dataset could go, much like lego-blocks. This will enable data fusion, and also thwart mis-guided use of datasets. Similarly the models that we build will carry not just the usual provenance, but explicit characteristics displaying known biases and hence added caution when being used in certain ways. While this trend started in social fields where bias is explicit, it has been successfully applied in the Planetary Data System (PDS) setup for identifying key descriptors in an equally diverse dataspace (https://pds.nasa.gov/datastandards/documents/im/v1/index_1G00.html#10.31%C2%A0%C2%A0class_pds_observation_area).	7'	TBD
Petr Skoda	SDSS redshift prediction based on Bayesian Deep Learning Bayesian deep learning is a relatively new approach that starts to enter the astronomy. Unlike majority of the current methods it does provide the uncertainty of its predictions. So we can visually check the suspicious cases with high uncertainty. We demonstrate this in the experiment with spectroscopic redshift prediction from SDSS quasar catalogues .This allowed us to find a number of quasars which are probably normal stars with wrong estimate of redshift from the SDSS pipeline.	7'	TBD

>
>

Raffaele D'Abrusco	Introduction	5'	pdf
Ashish Mahabal	Data Sheets and Model Cards Astronomy datasets have been growing, and so are the attempts to use them wth a variety of machine learning techniques. While we would like to use all data, data fusion for diverse uneven or not-fully-matched datasets can be a challenge. Creating machine learning and artificial intelligence models for such datasets and follow-up validation can be challenging owing to lack of large labeled training datasets. To address this two related concepts that have emerged recently in data science are that of Data Sheets for data sets (Gebru et al. arXiv:1803.09010), and model cards for models (Mitchell et al., arXiv:1810.03993). This is just like each component in the electronics industry comes with a datasheet that describes operating characteristics, test results, recommended use etc., We recommend that for each astronomy dataset uniform and standardized datasheets that advertise similar meta-properties should be created, not just stating what “is” but also where each of the dataset could go, much like lego-blocks. This will enable data fusion, and also thwart mis-guided use of datasets. Similarly the models that we build will carry not just the usual provenance, but explicit characteristics displaying known biases and hence added caution when being used in certain ways. While this trend started in social fields where bias is explicit, it has been successfully applied in the Planetary Data System (PDS) setup for identifying key descriptors in an equally diverse dataspace (https://pds.nasa.gov/datastandards/documents/im/v1/index_1G00.html#10.31%C2%A0%C2%A0class_pds_observation_area).	7'	pdf

Added:

>
>

Petr Skoda

SDSS redshift prediction based on Bayesian Deep Learning

Bayesian deep learning is a relatively new approach that starts to enter the astronomy. Unlike majority of the current methods it does provide the uncertainty of its predictions. So we can visually check the suspicious cases with high uncertainty. We demonstrate this in the experiment with spectroscopic redshift prediction from SDSS quasar catalogues .This allowed us to find a number of quasars which are probably normal stars with wrong estimate of redshift from the SDSS pipeline.

7'

pdf

Rafael Martinez Galarza

Harvesting outliers: data barriers to turn anomalies into discoveries

Over the last few years astronomers have become increasingly effective at identifying anomalous objects in large astronomical datasets. So far, that has meant "finding objects in sparsely populated regions of a multidimensional feature space". This is done using a number of methods that includes ensemble methods such as random forests searches, and more recently generative models that identify anomalies as those objects are more difficult to reconstruct by the trained model. This has produced huge lists of anomalies in diverse datasets that include SDSS galaxy spectra, Kepler and TESS light curves, and X-ray catalogs. Yet, most of those anomalies are not followed up, because of a cultural difficulty for scientists to interpret multi-dimensional scatter plots that have no labels in their axes. We argue that such cultural barrier can be overcome with novel ways to combine domain knowledge expertise with data visualization, or even incorporating domain knowledge directly into the anomaly detection algorithms. We would like to discuss ways in which VO tools can help in the identification of anomalies that represent true astronomical discoveries, by harvesting the currently publicly available catalogs of anomalies.

7'

pdf

Moderator: Raffaele D'Abrusco, Notetaker: TBD, Etherpad link

META FILEATTACHMENT	attachment="KD-IG_anomalies.pdf" attr="" comment="" date="1635945677" name="KD-IG_anomalies.pdf" path="KD-IG_anomalies.pdf" size="35575806" user="RaffaeleDAbrusco" version="1"

Revision 42021-11-03 - RaffaeleDAbrusco

META TOPICPARENT	name="InterOpNov2021"

Knowledge Discovery 1

Time: Wednesday Nov 03 22:00 UTC

Ashish Mahabal	Data Sheets and Model Cards Astronomy datasets have been growing, and so are the attempts to use them wth a variety of machine learning techniques. While we would like to use all data, data fusion for diverse uneven or not-fully-matched datasets can be a challenge. Creating machine learning and artificial intelligence models for such datasets and follow-up validation can be challenging owing to lack of large labeled training datasets. To address this two related concepts that have emerged recently in data science are that of Data Sheets for data sets (Gebru et al. arXiv:1803.09010), and model cards for models (Mitchell et al., arXiv:1810.03993). This is just like each component in the electronics industry comes with a datasheet that describes operating characteristics, test results, recommended use etc., We recommend that for each astronomy dataset uniform and standardized datasheets that advertise similar meta-properties should be created, not just stating what “is” but also where each of the dataset could go, much like lego-blocks. This will enable data fusion, and also thwart mis-guided use of datasets. Similarly the models that we build will carry not just the usual provenance, but explicit characteristics displaying known biases and hence added caution when being used in certain ways. While this trend started in social fields where bias is explicit, it has been successfully applied in the Planetary Data System (PDS) setup for identifying key descriptors in an equally diverse dataspace (https://pds.nasa.gov/datastandards/documents/im/v1/index_1G00.html#10.31%C2%A0%C2%A0class_pds_observation_area).	7'	TBD
Petr Skoda	SDSS redshift prediction based on Bayesian Deep Learning Bayesian deep learning is a relatively new approach that starts to enter the astronomy. Unlike majority of the current methods it does provide the uncertainty of its predictions. So we can visually check the suspicious cases with high uncertainty. We demonstrate this in the experiment with spectroscopic redshift prediction from SDSS quasar catalogues .This allowed us to find a number of quasars which are probably normal stars with wrong estimate of redshift from the SDSS pipeline.	7'	TBD

Changed:

<
<

Rafael Martinez Galarza

Harvesting outliers: data barriers to turn anomalies into discoveries

Over the last few years astronomers have become increasingly effective at identifying anomalous objects in large astronomical datasets. So far, that has meant "finding objects in sparsely populated regions of a multidimensional feature space". This is done using a number of methods that includes ensemble methods such as random forests searches, and more recently generative models that identify anomalies as those objects are more difficult to reconstruct by the trained model. This has produced huge lists of anomalies in diverse datasets that include SDSS galaxy spectra, Kepler and TESS light curves, and X-ray catalogs. Yet, most of those anomalies are not followed up, because of a cultural difficulty for scientists to interpret multi-dimensional scatter plots that have no labels in their axes. We argue that such cultural barrier can be overcome with novel ways to combine domain knowledge expertise with data visualization, or even incorporating domain knowledge directly into the anomaly detection algorithms. We would like to discuss ways in which VO tools can help in the identification of anomalies that represent true astronomical discoveries, by harvesting the currently publicly available catalogs of anomalies.

7'

TBD

>
>

Rafael Martinez Galarza

Harvesting outliers: data barriers to turn anomalies into discoveries

Over the last few years astronomers have become increasingly effective at identifying anomalous objects in large astronomical datasets. So far, that has meant "finding objects in sparsely populated regions of a multidimensional feature space". This is done using a number of methods that includes ensemble methods such as random forests searches, and more recently generative models that identify anomalies as those objects are more difficult to reconstruct by the trained model. This has produced huge lists of anomalies in diverse datasets that include SDSS galaxy spectra, Kepler and TESS light curves, and X-ray catalogs. Yet, most of those anomalies are not followed up, because of a cultural difficulty for scientists to interpret multi-dimensional scatter plots that have no labels in their axes. We argue that such cultural barrier can be overcome with novel ways to combine domain knowledge expertise with data visualization, or even incorporating domain knowledge directly into the anomaly detection algorithms. We would like to discuss ways in which VO tools can help in the identification of anomalies that represent true astronomical discoveries, by harvesting the currently publicly available catalogs of anomalies.

7'

pdf

Moderator: Raffaele D'Abrusco, Notetaker: TBD, Etherpad link

Added:

>
>

META FILEATTACHMENT	attachment="KD-IG_anomalies.pdf" attr="" comment="" date="1635945677" name="KD-IG_anomalies.pdf" path="KD-IG_anomalies.pdf" size="35575806" user="RaffaeleDAbrusco" version="1"

Revision 32021-10-27 - RaffaeleDAbrusco

META TOPICPARENT	name="InterOpNov2021"

Knowledge Discovery 1

Changed:

<
<

Time: Wednesday Nov 03 22:00 UTC

>
>

Time: Wednesday Nov 03 22:00 UTC

Changed:

<
<

Speaker(s)	Title and Abstract	Time	Material
Rafael Martinez Galarza	Harvesting outliers: data barriers to turn anomalies into discoveries Over the last few years astronomers have become increasingly effective at identifying anomalous objects in large astronomical datasets. So far, that has meant "finding objects in sparsely populated regions of a multidimensional feature space". This is done using a number of methods that includes ensemble methods such as random forests searches, and more recently generative models that identify anomalies as those objects are more difficult to reconstruct by the trained model. This has produced huge lists of anomalies in diverse datasets that include SDSS galaxy spectra, Kepler and TESS light curves, and X-ray catalogs. Yet, most of those anomalies are not followed up, because of a cultural difficulty for scientists to interpret multi-dimensional scatter plots that have no labels in their axes. We argue that such cultural barrier can be overcome with novel ways to combine domain knowledge expertise with data visualization, or even incorporating domain knowledge directly into the anomaly detection algorithms. We would like to discuss ways in which VO tools can help in the identification of anomalies that represent true astronomical discoveries, by harvesting the currently publicly available catalogs of anomalies.	7'	TBD
Petr Skoda	SDSS redshift prediction based on Bayesian Deep Learning	7'	TBD

>
>

Ashish Mahabal	Data Sheets and Model Cards Astronomy datasets have been growing, and so are the attempts to use them wth a variety of machine learning techniques. While we would like to use all data, data fusion for diverse uneven or not-fully-matched datasets can be a challenge. Creating machine learning and artificial intelligence models for such datasets and follow-up validation can be challenging owing to lack of large labeled training datasets. To address this two related concepts that have emerged recently in data science are that of Data Sheets for data sets (Gebru et al. arXiv:1803.09010), and model cards for models (Mitchell et al., arXiv:1810.03993). This is just like each component in the electronics industry comes with a datasheet that describes operating characteristics, test results, recommended use etc., We recommend that for each astronomy dataset uniform and standardized datasheets that advertise similar meta-properties should be created, not just stating what “is” but also where each of the dataset could go, much like lego-blocks. This will enable data fusion, and also thwart mis-guided use of datasets. Similarly the models that we build will carry not just the usual provenance, but explicit characteristics displaying known biases and hence added caution when being used in certain ways. While this trend started in social fields where bias is explicit, it has been successfully applied in the Planetary Data System (PDS) setup for identifying key descriptors in an equally diverse dataspace (https://pds.nasa.gov/datastandards/documents/im/v1/index_1G00.html#10.31%C2%A0%C2%A0class_pds_observation_area).	7'	TBD
Petr Skoda	SDSS redshift prediction based on Bayesian Deep Learning Bayesian deep learning is a relatively new approach that starts to enter the astronomy. Unlike majority of the current methods it does provide the uncertainty of its predictions. So we can visually check the suspicious cases with high uncertainty. We demonstrate this in the experiment with spectroscopic redshift prediction from SDSS quasar catalogues .This allowed us to find a number of quasars which are probably normal stars with wrong estimate of redshift from the SDSS pipeline.	7'	TBD
Rafael Martinez Galarza	Harvesting outliers: data barriers to turn anomalies into discoveries Over the last few years astronomers have become increasingly effective at identifying anomalous objects in large astronomical datasets. So far, that has meant "finding objects in sparsely populated regions of a multidimensional feature space". This is done using a number of methods that includes ensemble methods such as random forests searches, and more recently generative models that identify anomalies as those objects are more difficult to reconstruct by the trained model. This has produced huge lists of anomalies in diverse datasets that include SDSS galaxy spectra, Kepler and TESS light curves, and X-ray catalogs. Yet, most of those anomalies are not followed up, because of a cultural difficulty for scientists to interpret multi-dimensional scatter plots that have no labels in their axes. We argue that such cultural barrier can be overcome with novel ways to combine domain knowledge expertise with data visualization, or even incorporating domain knowledge directly into the anomaly detection algorithms. We would like to discuss ways in which VO tools can help in the identification of anomalies that represent true astronomical discoveries, by harvesting the currently publicly available catalogs of anomalies.	7'	TBD

Deleted:

<
<

Ashish Mahabal

Data Sheets and Model Cards

Astronomy datasets have been growing, and so are the attempts to use them wth a variety of machine learning techniques. While we would like to use all data, data fusion for diverse uneven or not-fully-matched datasets can be a challenge. Creating machine learning and artificial intelligence models for such datasets and follow-up validation can be challenging owing to lack of large labeled training datasets. To address this two related concepts that have emerged recently in data science are that of Data Sheets for data sets (Gebru et al. arXiv:1803.09010), and model cards for models (Mitchell et al., arXiv:1810.03993). This is just like each component in the electronics industry comes with a datasheet that describes operating characteristics, test results, recommended use etc., We recommend that for each astronomy dataset uniform and standardized datasheets that advertise similar meta-properties should be created, not just stating what “is” but also where each of the dataset could go, much like lego-blocks. This will enable data fusion, and also thwart mis-guided use of datasets. Similarly the models that we build will carry not just the usual provenance, but explicit characteristics displaying known biases and hence added caution when being used in certain ways. While this trend started in social fields where bias is explicit, it has been successfully applied in the Planetary Data System (PDS) setup for identifying key descriptors in an equally diverse dataspace (https://pds.nasa.gov/datastandards/documents/im/v1/index_1G00.html#10.31%C2%A0%C2%A0class_pds_observation_area).

7'

TBD

Changed:

<
<

Moderator: TBD, Notetaker: TBD, Etherpad link

>
>

Moderator: Raffaele D'Abrusco, Notetaker: TBD, Etherpad link

Revision 22021-10-26 - RaffaeleDAbrusco

META TOPICPARENT	name="InterOpNov2021"

Knowledge Discovery 1

Time: Wednesday Nov 03 22:00 UTC

Speaker(s)	Title and Abstract	Time	Material

Changed:

<
<

Rafael Martinez Galarza

Harvesting outliers: data barriers to turn anomalies into discoveries

Over the last few years astronomers have become increasingly effective

at identifying anomalous objects in large astronomical datasets. So

far, that has meant "finding objects in sparsely populated regions of

a multidimensional feature space". This is done using a number of

methods that includes ensemble methods such as random forests

searches, and more recently generative models that identify anomalies

as those objects are more difficult to reconstruct by the trained

model. This has produced huge lists of anomalies in diverse datasets

that include SDSS galaxy spectra, Kepler and TESS light curves, and

X-ray catalogs. Yet, most of those anomalies are not followed up,

because of a cultural difficulty for scientists to interpret

multi-dimensional scatter plots that have no labels in their axes. We

argue that such cultural barrier can be overcome with novel ways to

combine domain knowledge expertise with data visualization, or even

incorporating domain knowledge directly into the anomaly detection

algorithms. We would like to discuss ways in which VO tools can help

in the identification of anomalies that represent true astronomical

discoveries, by harvesting the currently publicly available catalogs

of anomalies.

7'

TBD

>
>

Rafael Martinez Galarza

Harvesting outliers: data barriers to turn anomalies into discoveries

Over the last few years astronomers have become increasingly effective at identifying anomalous objects in large astronomical datasets. So far, that has meant "finding objects in sparsely populated regions of a multidimensional feature space". This is done using a number of methods that includes ensemble methods such as random forests searches, and more recently generative models that identify anomalies as those objects are more difficult to reconstruct by the trained model. This has produced huge lists of anomalies in diverse datasets that include SDSS galaxy spectra, Kepler and TESS light curves, and X-ray catalogs. Yet, most of those anomalies are not followed up, because of a cultural difficulty for scientists to interpret multi-dimensional scatter plots that have no labels in their axes. We argue that such cultural barrier can be overcome with novel ways to combine domain knowledge expertise with data visualization, or even incorporating domain knowledge directly into the anomaly detection algorithms. We would like to discuss ways in which VO tools can help in the identification of anomalies that represent true astronomical discoveries, by harvesting the currently publicly available catalogs of anomalies.

7'

TBD

Petr Skoda

SDSS redshift prediction based on Bayesian Deep Learning

7'

TBD

Changed:

<
<

Ashish Mahabal

7'

TBD

>
>

Ashish Mahabal

Data Sheets and Model Cards

Astronomy datasets have been growing, and so are the attempts to use them wth a variety of machine learning techniques. While we would like to use all data, data fusion for diverse uneven or not-fully-matched datasets can be a challenge. Creating machine learning and artificial intelligence models for such datasets and follow-up validation can be challenging owing to lack of large labeled training datasets. To address this two related concepts that have emerged recently in data science are that of Data Sheets for data sets (Gebru et al. arXiv:1803.09010), and model cards for models (Mitchell et al., arXiv:1810.03993). This is just like each component in the electronics industry comes with a datasheet that describes operating characteristics, test results, recommended use etc., We recommend that for each astronomy dataset uniform and standardized datasheets that advertise similar meta-properties should be created, not just stating what “is” but also where each of the dataset could go, much like lego-blocks. This will enable data fusion, and also thwart mis-guided use of datasets. Similarly the models that we build will carry not just the usual provenance, but explicit characteristics displaying known biases and hence added caution when being used in certain ways. While this trend started in social fields where bias is explicit, it has been successfully applied in the Planetary Data System (PDS) setup for identifying key descriptors in an equally diverse dataspace (https://pds.nasa.gov/datastandards/documents/im/v1/index_1G00.html#10.31%C2%A0%C2%A0class_pds_observation_area).

7'

TBD

Moderator: TBD, Notetaker: TBD, Etherpad link

Revision 12021-10-25 - RaffaeleDAbrusco

META TOPICPARENT	name="InterOpNov2021"

Knowledge Discovery 1

Time: Wednesday Nov 03 22:00 UTC

Speaker(s)	Title and Abstract	Time	Material
Rafael Martinez Galarza	Harvesting outliers: data barriers to turn anomalies into discoveries Over the last few years astronomers have become increasingly effective at identifying anomalous objects in large astronomical datasets. So far, that has meant "finding objects in sparsely populated regions of a multidimensional feature space". This is done using a number of methods that includes ensemble methods such as random forests searches, and more recently generative models that identify anomalies as those objects are more difficult to reconstruct by the trained model. This has produced huge lists of anomalies in diverse datasets that include SDSS galaxy spectra, Kepler and TESS light curves, and X-ray catalogs. Yet, most of those anomalies are not followed up, because of a cultural difficulty for scientists to interpret multi-dimensional scatter plots that have no labels in their axes. We argue that such cultural barrier can be overcome with novel ways to combine domain knowledge expertise with data visualization, or even incorporating domain knowledge directly into the anomaly detection algorithms. We would like to discuss ways in which VO tools can help in the identification of anomalies that represent true astronomical discoveries, by harvesting the currently publicly available catalogs of anomalies.	7'	TBD
Petr Skoda	SDSS redshift prediction based on Bayesian Deep Learning	7'	TBD
Ashish Mahabal		7'	TBD

Moderator: TBD, Notetaker: TBD, Etherpad link

Difference: InterOpNov2021KD (1 vs. 8)

Revision 82023-04-26 - RaffaeleDAbrusco

Knowledge Discovery 1

Revision 72021-11-04 - RaffaeleDAbrusco

Knowledge Discovery 1

Revision 62021-11-03 - RaffaeleDAbrusco

Knowledge Discovery 1

Revision 52021-11-03 - RaffaeleDAbrusco

Knowledge Discovery 1

Revision 42021-11-03 - RaffaeleDAbrusco

Knowledge Discovery 1

Revision 32021-10-27 - RaffaeleDAbrusco

Knowledge Discovery 1

Revision 22021-10-26 - RaffaeleDAbrusco

Knowledge Discovery 1

Revision 12021-10-25 - RaffaeleDAbrusco

Knowledge Discovery 1