如何将UCI的.data转成arff?

不要自己做Arff文件,麻烦,你做成.csv,让weka自己另存为就行了。

文本文件打开,把字段名复制到第一段,保存成.csv

举个例子:
@RELATION cars

@ATTRIBUTE buying REAL
@ATTRIBUTE maint REAL
@ATTRIBUTE doors REAL
@ATTRIBUTE persons REAL
@ATTRIBUTE lug-boot REAL
@ATTRIBUTE safety REAL
@ATTRIBUTE class {unacc,acc,good,v-good}

@DATA
vhigh,vhigh,2,2,small,low,unacc
vhigh,vhigh,2,2,small,med,unacc
vhigh,vhigh,2,2,small,high,unacc
vhigh,vhigh,2,2,med,low,unacc
vhigh,vhigh,2,2,med,med,unacc
vhigh,vhigh,2,2,med,high,unacc
vhigh,vhigh,2,2,big,low,unacc
vhigh,vhigh,2,2,big,med,unacc

然后用Weka打开就可以了。

数据挖掘小论文 My draft version of data mining course thesis

分类挖掘在图像识别领域的应用

韦国华 (中国科技技术大学 软件工程硕士 上海四期班, 上海 200333)

朱  明 (中国科技技术大学 自动化系, 安徽 合肥 230051)

摘要:视频处理和识别系统是一个较为复杂的计算机软件系统。其处理和识别的结果需要有一个好的可信性分类方法和一个自动化分类工具。目前我们在一些系统上仍然需要人工干预来实现整个系统的完整运行和执行,然而人工的干预工作量大,其判别结果易受人为因素的影响很大,且存在视觉易疲劳和检测速度缓慢等问题,给最终的结果带来很大的干扰。这里我们介绍一种针对一些特定的图像段按色差自动分类的方法,使用从室外采集到的一些随机图像样本实例及其已知的特征数据,将各个图像段进行分类,并对其结果作出客观评估,为提升识别率提供依据。

关键字:数据挖掘;图像处理;分类挖掘

Classification mining in the field of image recognition

Wei Guo Hua1,  Zhu Ming2

(1.Department of Automation, University of Science and Technology of China, Shanghai, China; 2. Department of Automation, University of Science and Technology of China, Hefei, China;)

Abstract Imaging processing & identification is a kind of complex software system. It needs an effective way and a automatic classified tool to test it for keep it credibly. Today, in lots of image processing or intellegent systems, we still need some manual intervention to keep or assure that they can work exactly and perfectly in accuracy and integrity. however manual intervention also brings with plenty of malign influence on the final result, which may get things even worse. Here we introduce a way of automatic classification using some specified image segamentations which were drawn randomly from a database of 7 outdoor images, classify them with DM classifier, and try to evaluate the results, lets see how classifaction and relevant algorithms can help and improve the accuracy of image recognition.

Keywords: Data mining, Image processing, Classification

Continue reading “数据挖掘小论文 My draft version of data mining course thesis”

Image Segmentation Data Set

本来我想用前面那个Human Activity Recognition Using Smartphones Data Set来完成我的Data mining的结业小论文,但后来在将该Dataset转换为Weka的arff格式时碰到点问题,所以就放弃了,最终将以下面这个Image Segmentation Data Set来写,同时相对来说,这个Dataset也跟我的工作更加相关、相近一些。

本Dataset源于:http://archive.ics.uci.edu/ml/datasets/Image+Segmentation

Continue reading “Image Segmentation Data Set”

Human Activity Recognition Using Smartphones Data Set

From: http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones

Download: Data Folder, Data Set DescriptionAbstract: Human Activity Recognition database built from the recordings of 30 subjects performing activities of daily living (ADL) while carrying a waist-mounted smartphone with embedded inertial sensors.
Data Set Characteristics:   Multivariate, Time-Series Number of Instances: 10299 Area: Computer
Attribute Characteristics: N/A Number of Attributes: 561 Date Donated 2012-12-10
Associated Tasks: Classification, Clustering Missing Values? N/A Number of Web Hits: 33493

Source:

Jorge L. Reyes-Ortiz, Davide Anguita, Alessandro Ghio, Luca Oneto.
Smartlab – Non Linear Complex Systems Laboratory
DITEN – Università degli Studi di Genova, Genoa I-16145, Italy.
activityrecognition ‘@’ smartlab.ws
www.smartlab.ws

Data Set Information:

The experiments have been carried out with a group of 30 volunteers within an age bracket of 19-48 years. Each person performed six activities (WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING) wearing a smartphone (Samsung Galaxy S II) on the waist. Using its embedded accelerometer and gyroscope, we captured 3-axial linear acceleration and 3-axial angular velocity at a constant rate of 50Hz. The experiments have been video-recorded to label the data manually. The obtained dataset has been randomly partitioned into two sets, where 70% of the volunteers was selected for generating the training data and 30% the test data.

The sensor signals (accelerometer and gyroscope) were pre-processed by applying noise filters and then sampled in fixed-width sliding windows of 2.56 sec and 50% overlap (128 readings/window). The sensor acceleration signal, which has gravitational and body motion components, was separated using a Butterworth low-pass filter into body acceleration and gravity. The gravitational force is assumed to have only low frequency components, therefore a filter with 0.3 Hz cutoff frequency was used. From each window, a vector of features was obtained by calculating variables from the time and frequency domain.

Check the README.txt file for further details about this dataset.

Attribute Information:

For each record in the dataset it is provided:
– Triaxial acceleration from the accelerometer (total acceleration) and the estimated body acceleration.
– Triaxial Angular velocity from the gyroscope.
– A 561-feature vector with time and frequency domain variables.
– Its activity label.
– An identifier of the subject who carried out the experiment.

Relevant Papers:

N/A

Citation Request:

[1] Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra and Jorge L. Reyes-Ortiz. Human Activity Recognition on Smartphones using a Multiclass Hardware-Friendly Support Vector Machine. International Workshop of Ambient Assisted Living (IWAAL 2012). Vitoria-Gasteiz, Spain. Dec 2012

Preparing a thesis of data mining

We finished our last class of data mining, and Professor Zhu(朱明) would like to close this course by a thesis about data mining, and the data material would better if relevant to our job.

Here are the detail requirements:

The five steps about writing this thesis:

1)      业务需求分析  比如:为了研究XXX数据,对YYY有帮助,bla bla bla bla…

2)      定义数据挖掘任务  比如:找出ZZZ数据, 要有关键句:这是一个分类/关联/聚类/异类挖掘任务

3)      数据准备,预处理  要说明数据来历

4)      应用的数据挖掘算法  用软件的截屏来说明(Weka)

5)      对结果进行评估  分析是否能解决需求提出的问题

Algorithm & data mining tools download:

Weka: http://sourceforge.net/projects/weka/?source=dlp

Dataset for analysis(if you using a public dataset for data mining)

Google with keywords like these: uci machine-learning database

Data downloads: http://archive.ics.uci.edu/ml/

Draft submit: April 20, 2013

Due date: April 25, 2013