# Software mining

> Mediated Wiki article. Canonical URL: https://mediated.wiki/source/Software_mining
> Markdown URL: https://mediated.wiki/source/Software_mining.md
> Source: https://en.wikipedia.org/wiki/Software_mining
> Source revision: 1352278555
> License: Creative Commons Attribution-ShareAlike 4.0 International (https://creativecommons.org/licenses/by-sa/4.0/)

Application of knowledge discovery in software modernization

This article relies largely or entirely on a single source. Please help improve this article by citing more sources. Find sources: "Software mining" – news · newspapers · books · scholar · JSTOR (May 2026)

**Software mining** is a subfield of [software engineering](/source/Software_engineering) that focuses on extracting and analyzing information from software artifacts stored in repositories such as version control systems, issue trackers, and communication logs. It aims to uncover patterns and actionable insights about software systems and development processes using techniques such as [data mining](/source/Data_mining), [statistical analysis](/source/Statistical_analysis), and [machine learning](/source/Machine_learning), supporting activities like [software maintenance](/source/Software_maintenance), [evolution](/source/Software_evolution), and [quality assessment](/source/Software_quality_management).[1]

## Object Management Group (OMG)

Developed specification [Knowledge Discovery Metamodel](/source/Knowledge_Discovery_Metamodel) (KDM) which defines an [ontology](/source/Ontology) for software assets and their relationships for the purpose of performing knowledge discovery of existing code. The OMG [Knowledge Discovery Metamodel](/source/Knowledge_Discovery_Metamodel) provides an integrated representation to capturing application [metadata](/source/Metadata). Another OMG specification, the [Common Warehouse Metamodel](/source/Common_Warehouse_Metamodel) focuses entirely on mining enterprise [metadata](/source/Metadata).

## Software mining and data mining

Software mining is closely related to [data mining](/source/Data_mining), since existing software artifacts contain enormous business value, key for the evolution of software systems. Knowledge discovery from software systems addresses structure, behavior as well as the data processed by the software system. Instead of mining individual [data sets](/source/Data_set), software mining focuses on [metadata](/source/Metadata).

## Text-Mining Software Tools

[Text mining](/source/Text_mining) software tools enable easy handling of text documents for the purpose of data analysis including automatic model generation and [document classification](/source/Document_classification), [document clustering](/source/Document_clustering), document visualization, dealing with Web documents, and [crawling the Web](/source/Web_crawler).

## Levels of software mining

*Knowledge discovery in software* is related to a concept of [reverse engineering](/source/Reverse_engineering). Software mining addresses structure, behavior as well as the data processed by the software system.

Mining software systems may happen at various *levels*:

- program level (individual statements and variables)

- [design pattern](/source/Design_pattern) level

- [call graph](/source/Call_graph) level (individual procedures and their relationships)

- architectural level (subsystems and their interfaces)

- data level (individual columns and attributes of data stores)

- application level (key data items and their flow through the applications)

- business level (domain concepts, business rules and their implementation in code)

## Forms of representing the results of Software Mining

- [data model](/source/Data_model)

- [metadata](/source/Metadata)

- [metamodels](/source/Metamodeling)

- [ontology](/source/Ontology)

- [Knowledge representation](/source/Knowledge_representation)

- [business rule](/source/Business_rule)

- [Knowledge Discovery Metamodel](/source/Knowledge_Discovery_Metamodel) (KDM)

- [Business Process Modeling Notation](/source/Business_Process_Modeling_Notation) (BPMN)

- [intermediate representation](/source/Intermediate_representation)

- [Resource Description Framework](/source/Resource_Description_Framework) (RDF)

- [abstract syntax tree](/source/Abstract_syntax_tree) (AST)

- [software metrics](/source/Software_metric)

- [graphical user interfaces](/source/Graphical_user_interface)

## See also

- [Mining Software Repositories](/source/Mining_Software_Repositories)

## References

1. **[^](#cite_ref-1)** Siddiqui, Tamanna; Ahmad, Ausaf (2017). "Data mining tools and techniques for mining software repositories: A systematic review". *Big Data Analytics: Proceedings of CSI 2015*. pp. 717–726. [doi](/source/Doi_(identifier)):[10.1007/978-981-10-6620-7_70](https://doi.org/10.1007%2F978-981-10-6620-7_70).

---
Adapted from the Wikipedia article [Software mining](https://en.wikipedia.org/wiki/Software_mining) by Wikipedia contributors ([contributor history](https://en.wikipedia.org/wiki/Software_mining?action=history)). Available under [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/). Changes may have been made.
