Compare commits
3 Commits
Author | SHA1 | Date |
---|---|---|
artieyue | 491e52eb81 | |
artieyue | a1fa4d8a31 | |
artieyue | 6cfca9f940 |
57
README.md
57
README.md
|
@ -1,56 +1 @@
|
|||
|
||||
# 北师大OpenCT社区/OpenBrain项目
|
||||
|
||||
## 项目背景
|
||||
北师大OpenCT社区/OpenBrain项目是OpenCT社区的一个重要子项目,旨在元宇宙背景下,基于ChatGPT大语言模型,构建一个学生多模态综合评价系统。该系统对学生在学习和测试任务中的大数据进行自动化评分和报告,形成学生学习和评价问答小助手。项目依托北京师范大学的科研资源和技术优势,结合人工智能和大数据技术,为教育领域提供智能化的学生综合评价解决方案。
|
||||
|
||||
## 项目内容
|
||||
该项目主要包括以下几个方面:
|
||||
|
||||
### 多模态数据处理
|
||||
- 对从多种教育资源和平台中采集学生的学习和测试数据,包括文本、语音、视频和行为数据等进行熟悉和整理。
|
||||
- 对采集的数据进行预处理,包括数据清洗、格式转换和标注,确保数据的质量和一致性。
|
||||
|
||||
### 构建模型进行数据挖掘和分析
|
||||
- 基于ChatGPT大语言模型,构建数据挖掘模型,对学生数据进行分析。
|
||||
|
||||
### 学生综合评价报告生成
|
||||
- 根据评分结果,自动生成详细的学生综合评价报告。
|
||||
- 报告内容包括成绩分析、学习建议、行为评价等,为教师和学生提供参考。
|
||||
|
||||
## 项目优势
|
||||
- **学术资源丰富**:项目依托北京师范大学、清华大学、北京科技大学、河北师范大学等团队学术资源,拥有一流的研究团队和丰富的科研成果。
|
||||
- **开源社区支持**:作为一个开源项目,参与者可以获取最新的技术文档和代码,参与项目的开发和维护。
|
||||
- **创新技术应用**:利用ChatGPT和大数据技术,提供智能化、自动化的学生综合评价解决方案。
|
||||
|
||||
## 参与方式
|
||||
我们欢迎来自全国的计算机专业大学生加入我们的项目,通过以下方式参与:
|
||||
|
||||
1. **开源代码贡献**:访问浏览项目代码和文档,提出问题或贡献代码。
|
||||
2. **技术交流与合作**:加入我们的线上交流社区QQ群(389801885),与其他开发者和研究者进行技术交流和合作。
|
||||
3. **论文撰写参与**:参与合作导师指导的项目学术论文的撰写和投稿,获得宝贵的科研经验和指导。
|
||||
|
||||
## 目标
|
||||
- 构建一个基于ChatGPT大语言模型的学生多模态综合评价模型库,能够对学生在学习和测试任务中的大数据进行自动化评分和报告。
|
||||
- 提供一个智能化的学生学习和评价问答小助手,提升学生的学习体验和教师的教学效果。
|
||||
|
||||
## 难度
|
||||
中等
|
||||
|
||||
## 产出要求
|
||||
- 形成数据挖掘模型,进行数据分析和挖掘。
|
||||
- 撰写详细的分析报告。
|
||||
- 对模型进行测试和优化,确保其性能和准确性。
|
||||
|
||||
## 能力要求
|
||||
- 熟悉Python或R编程语言。
|
||||
- 熟悉常见的机器学习和深度学习算法。
|
||||
- 了解自然语言处理和大数据分析技术。
|
||||
- 熟悉教育数据的特点和分析需求。
|
||||
- 熟悉Markdown文档报告撰写。
|
||||
|
||||
## 导师
|
||||
- 罗海风(联系邮箱:luohaifeng@tsinghua.edu.cn)
|
||||
|
||||
## 结语
|
||||
北师大OpenCT社区/OpenBrain项目致力于通过开源和协作,推动教育评价系统的智能化和自动化发展。我们期待更多的计算机专业学生参与进来,共同为教育赋能,为技术创新贡献力量。让我们一起,探索学生综合评价的未来!
|
||||
# openbrain
|
|
@ -1,55 +0,0 @@
|
|||
#### 过程性数据处理
|
||||
|
||||
使用过程性数据补充结果性数据中缺失的结束时间,以此可以重新进行结果性数据分析中的学生作答时长分析部分
|
||||
|
||||
- split.py - 从系统抓取的全部原始数据中分割出属于各答卷的部分
|
||||
- get_stoptime_a.py - 使用过程性数据中补充a卷结果性数据中缺失的结束时间
|
||||
- get_stoptime_z.py - 使用过程性数据中补充z卷结果性数据中缺失的结束时间
|
||||
- statistic_time.py - 重新进行作答时长统计
|
||||
|
||||
从原始过程性数据中构造出学生的个性化特征指标数据集,对每位学生作答每页具体题目均给出6项特征值
|
||||
|
||||
- preprocess_processdata_A.py - 预处理原始过程性数据,增加对每个数据记录的特征标记
|
||||
- getcharacter_processdata_A.py - 分析已完成预处理的过程性数据,得到学生个性化特征指标数据集
|
||||
|
||||
#### 特征值
|
||||
|
||||
##### Time:在每个页面上停留时间
|
||||
|
||||
- 仅考虑所有“有效停留”的时间总和,“有效停留”定义为在该页面上停留时间长于1秒
|
||||
- 目前只能对每个页面的用时进行区别,在少数情况下,一个页面上会放置多个题目,但此时学生很可能不按顺序作答题目,因此放弃处理,只分析学生在每个页面上停留的时间(其他特征值同理)
|
||||
- 划分每道题目用时的代码逻辑:
|
||||
- 新增两列page,page_time,在每页结束行的对应行(即每个page结束后第一个page改变行)标注该page的使用时长
|
||||
- 若改变了用户id或task_name,则当前时间设为starttime(即第一道题的开始时间)
|
||||
- 若改变了page且page!=1,即page != old_page and page!=1,则当前时间设为old_page的stoptime,和记录的starttime进行运算,记录old_page加到page列中,记录时间差加到page_time列中
|
||||
|
||||
##### Repeat:作答完成每个页面后,返回该页面查看的次数
|
||||
|
||||
- 指除正常按顺序查看本页面之外,学生在作答其他页面的题目时,可能希望重新查看该页面作为参考,每次回到本页面视为一次返回
|
||||
- 因为答题系统不能跳跃查看题目,如作答第5页面时希望参考第3页面,这就需要在4页面上短暂停留,但这一访问4页面的动作其实是无意义的,因此在分析时在页面的停留时间大于1秒才视为1次有效的返回停留
|
||||
- 可以根据上一步得到的page和pagetime列直接处理得到
|
||||
|
||||
##### Revise:在每个页面上修改次数
|
||||
|
||||
- 对选择题:修改答案(容易得到)
|
||||
- 对填空题:将一个连续删除序列视为一次修改(不容易得到,因为是逐行读取数据,这其实需要3行数据)
|
||||
- 增加edit列,标注每行是否是修改动作,对该列统计得到
|
||||
|
||||
##### Before:进入每个页面,到第一次作答的时间
|
||||
|
||||
- 定位每页的第一次修改和上一个动作的时间差
|
||||
- 增加before列,在每页的第一次修改行、写入该行和上一行的时间差,取该列的值得到
|
||||
|
||||
##### After:第一次完成每个页面作答,到离开页面的时间
|
||||
|
||||
- 答题系统的设计逻辑是:只有完成前一页题目,才能进入下一页题目,因此首次进入每页题目的前一个动作,必定是前一页题目的完成动作。
|
||||
- 定位某页的第一次进入和上一个动作的时间差
|
||||
- 增加after列,在每页的第一次进入行、写入该行和上一行的时间差,取该列的值得到
|
||||
|
||||
##### AR:作答完成每个页面后,返回该页面修改的次数
|
||||
|
||||
- 答题系统的设计逻辑是:只有完成前一页题目,才能进入下一页题目,因此首次进入每页题目的前一个动作,必定是前一页题目的完成动作。
|
||||
- 直接取每行对应的page(非page列)判断得到完成后返回的行,取和前文得到的edit列的交集
|
||||
- 增加AR列,对该列统计得到
|
||||
- 修改总次数 = 完成后修改次数 + 完成前修改次数
|
||||
- 判断完成后返回行、第一次进入的动作可同时进行
|
|
@ -1,77 +0,0 @@
|
|||
import pandas as pd
|
||||
from datetime import datetime
|
||||
|
||||
time_dict = {}
|
||||
stop_time_dict = {}
|
||||
data = pd.read_excel(r'A.xlsx')
|
||||
|
||||
datetimeFormat = '%Y-%m-%dT%H:%M:%S.%f+08:00'
|
||||
datetimeFormat2 = '%Y-%m-%dT%H:%M:%S+08:00'
|
||||
|
||||
for index, row in data.iterrows():
|
||||
id = str(row['ticket_id'])
|
||||
timestamp = str(row['timestamp'])
|
||||
if (id not in time_dict.keys()):
|
||||
time_dict[id] = []
|
||||
time_dict[id].append(timestamp)
|
||||
else:
|
||||
time_dict[id].append(timestamp)
|
||||
|
||||
for key, value in time_dict.items():
|
||||
stop_time_dict[key] = value[-1]
|
||||
|
||||
data2 = pd.read_excel(r'a_out_0714.xlsx')
|
||||
|
||||
stoptime_new_list = []
|
||||
time_new_list = []
|
||||
empty = 0
|
||||
|
||||
for index, row in data2.iterrows():
|
||||
print(index)
|
||||
id = str(row['ticket_id'])
|
||||
P1_CODE = row['P3_CODE']
|
||||
MM60101_CODE = row['MM60311_CODE']
|
||||
if (pd.isna(row['stop_time']) and (int(P1_CODE) != 99 or int(MM60101_CODE) != 99)):
|
||||
if (id in stop_time_dict.keys()):
|
||||
timestamp = stop_time_dict[id]
|
||||
stoptime_new_list.append(timestamp)
|
||||
try:
|
||||
date2 = datetime.strptime(str(timestamp), datetimeFormat)
|
||||
except ValueError:
|
||||
date2 = datetime.strptime(str(timestamp), datetimeFormat2)
|
||||
else:
|
||||
empty = empty + 1
|
||||
stoptime_new_list.append("")
|
||||
time_new_list.append("")
|
||||
continue
|
||||
elif (pd.isna(row['stop_time']) and int(P1_CODE) == 99 and int(MM60101_CODE) == 99):
|
||||
empty = empty + 1
|
||||
stoptime_new_list.append("")
|
||||
time_new_list.append("")
|
||||
continue
|
||||
else:
|
||||
stoptime_new_list.append("")
|
||||
try:
|
||||
date2 = datetime.strptime(str(row['stop_time']), datetimeFormat)
|
||||
except ValueError:
|
||||
date2 = datetime.strptime(str(row['stop_time']), datetimeFormat2)
|
||||
try:
|
||||
date1 = datetime.strptime(str(row['start_time']), datetimeFormat)
|
||||
except ValueError:
|
||||
date1 = datetime.strptime(str(row['start_time']), datetimeFormat2)
|
||||
delta = date2 - date1
|
||||
miao = delta.seconds
|
||||
fen = round(miao/60, 2)
|
||||
time_new_list.append(fen)
|
||||
|
||||
col_name = data2.columns.tolist()
|
||||
|
||||
col_name.insert(col_name.index('stop_time')+1, 'stoptime_new')
|
||||
data2 = data2.reindex(columns=col_name)
|
||||
data2['stoptime_new'] = stoptime_new_list
|
||||
|
||||
col_name.insert(col_name.index('cost_time')+1, 'time_new')
|
||||
data2 = data2.reindex(columns=col_name)
|
||||
data2['time_new'] = time_new_list
|
||||
|
||||
data2.to_excel('a_out_0719.xlsx')
|
|
@ -1,77 +0,0 @@
|
|||
import pandas as pd
|
||||
from datetime import datetime
|
||||
|
||||
time_dict = {}
|
||||
stop_time_dict = {}
|
||||
data = pd.read_excel(r'Z.xlsx')
|
||||
|
||||
datetimeFormat = '%Y-%m-%dT%H:%M:%S.%f+08:00'
|
||||
datetimeFormat2 = '%Y-%m-%dT%H:%M:%S+08:00'
|
||||
|
||||
for index, row in data.iterrows():
|
||||
id = str(row['ticket_id'])
|
||||
timestamp = str(row['timestamp'])
|
||||
if (id not in time_dict.keys()):
|
||||
time_dict[id] = []
|
||||
time_dict[id].append(timestamp)
|
||||
else:
|
||||
time_dict[id].append(timestamp)
|
||||
|
||||
for key, value in time_dict.items():
|
||||
stop_time_dict[key] = value[-1]
|
||||
|
||||
data2 = pd.read_excel(r'z_out_0715.xlsx')
|
||||
|
||||
stoptime_new_list = []
|
||||
time_new_list = []
|
||||
empty = 0
|
||||
|
||||
for index, row in data2.iterrows():
|
||||
print(index)
|
||||
id = str(row['ticket_id'])
|
||||
P1_CODE = row['P1_CODE']
|
||||
MM60101_CODE = row['MM60101_CODE']
|
||||
if (pd.isna(row['stop_time']) and (int(P1_CODE) != 99 or int(MM60101_CODE) != 99)):
|
||||
if (id in stop_time_dict.keys()):
|
||||
timestamp = stop_time_dict[id]
|
||||
stoptime_new_list.append(timestamp)
|
||||
try:
|
||||
date2 = datetime.strptime(str(timestamp), datetimeFormat)
|
||||
except ValueError:
|
||||
date2 = datetime.strptime(str(timestamp), datetimeFormat2)
|
||||
else:
|
||||
empty = empty + 1
|
||||
stoptime_new_list.append("")
|
||||
time_new_list.append("")
|
||||
continue
|
||||
elif (pd.isna(row['stop_time']) and int(P1_CODE) == 99 and int(MM60101_CODE) == 99):
|
||||
empty = empty + 1
|
||||
stoptime_new_list.append("")
|
||||
time_new_list.append("")
|
||||
continue
|
||||
else:
|
||||
stoptime_new_list.append("")
|
||||
try:
|
||||
date2 = datetime.strptime(str(row['stop_time']), datetimeFormat)
|
||||
except ValueError:
|
||||
date2 = datetime.strptime(str(row['stop_time']), datetimeFormat2)
|
||||
try:
|
||||
date1 = datetime.strptime(str(row['start_time']), datetimeFormat)
|
||||
except ValueError:
|
||||
date1 = datetime.strptime(str(row['start_time']), datetimeFormat2)
|
||||
delta = date2 - date1
|
||||
miao = delta.seconds
|
||||
fen = round(miao/60, 2)
|
||||
time_new_list.append(fen)
|
||||
|
||||
col_name = data2.columns.tolist()
|
||||
|
||||
col_name.insert(col_name.index('stop_time')+1, 'stoptime_new')
|
||||
data2 = data2.reindex(columns=col_name)
|
||||
data2['stoptime_new'] = stoptime_new_list
|
||||
|
||||
col_name.insert(col_name.index('cost_time')+1, 'time_new')
|
||||
data2 = data2.reindex(columns=col_name)
|
||||
data2['time_new'] = time_new_list
|
||||
|
||||
data2.to_excel('z_out_0719.xlsx')
|
|
@ -1,116 +0,0 @@
|
|||
# ==========================================================================
|
||||
# 处理过程性数据获取学生特征值 - step2
|
||||
# 本脚本以完成了预处理的【已排序的】学生作答过程性作为待处理的原始数据文档,对其进行
|
||||
# 特征值抽取,获得每个学生对每个作答页面的以下特征值数据:
|
||||
# Time:在每个页面上停留的时间(总计,每次大于1秒)
|
||||
# Revise:在每个页面上修改的次数(总计)
|
||||
# Repeat:停留的次数(总计,大于1秒)
|
||||
# Before:进入每个小题作答前时间
|
||||
# After:进入每个小题作答后时间
|
||||
# AR:每个小题作答完成后再返回被修改次数
|
||||
# ==========================================================================
|
||||
|
||||
import json
|
||||
import pandas as pd
|
||||
from sqlalchemy import column
|
||||
|
||||
data = pd.read_excel(r"A_demo_out.xlsx")
|
||||
id_dict = {} # 最终全部数据都保存在这一变量中,可以直接转换为pandas类型对象,进而写入excel文件
|
||||
for index, row in data.iterrows():
|
||||
# 逐行读取
|
||||
id = str(row["ticket_id"])
|
||||
task_name = str(row["task_name"])
|
||||
if (id not in id_dict.keys()):
|
||||
id_dict[id] = {}
|
||||
id_com = id_dict[id]
|
||||
# 以下代码:读取页数信息、读取预处理得到的关键值,若原表格中为空值则此处赋值为-1
|
||||
edit = int(row["edit"])
|
||||
AR = int(row["AR"])
|
||||
answer = json.loads(row['task_answer'])
|
||||
frame = answer["frame"]
|
||||
if (frame == None):
|
||||
page_now = -1
|
||||
else:
|
||||
page_now = int(frame["data"]["page"])
|
||||
|
||||
if (pd.isnull(row["page"])):
|
||||
page = -1
|
||||
else:
|
||||
page = int(row["page"])
|
||||
|
||||
if (pd.isnull(row["pagetime"])):
|
||||
pagetime = -1
|
||||
else:
|
||||
pagetime = float(row["pagetime"])
|
||||
|
||||
if (pd.isnull(row["before"])):
|
||||
before = -1
|
||||
else:
|
||||
before = float(row["before"])
|
||||
|
||||
if (pd.isnull(row["after"])):
|
||||
after = -1
|
||||
else:
|
||||
after = float(row["after"])
|
||||
|
||||
if (task_name == "运动会问题"):
|
||||
if (page != -1 and pagetime != -1):
|
||||
column_time = "sports_Time_A" + str(page) # 停留总时间
|
||||
column_repeat = "sports_Repeat_A" + str(page) # 返回停留的次数
|
||||
if (column_time in id_com.keys()):
|
||||
id_com[column_time] = id_com[column_time] + pagetime
|
||||
id_com[column_repeat] = id_com[column_repeat] + 1
|
||||
else:
|
||||
id_com[column_time] = pagetime
|
||||
id_com[column_repeat] = 0
|
||||
if (page_now != -1):
|
||||
column_revise = "sports_Revise_A" + str(page_now) # 修改总次数
|
||||
if (column_revise in id_com.keys()):
|
||||
id_com[column_revise] = id_com[column_revise] + edit
|
||||
else:
|
||||
id_com[column_revise] = 0
|
||||
column_AR = "sports_AR_A" + str(page_now) # 完成后返回修改总次数
|
||||
if (column_AR in id_com.keys()):
|
||||
id_com[column_AR] = id_com[column_AR] + AR
|
||||
else:
|
||||
id_com[column_AR] = 0
|
||||
if (before != -1):
|
||||
column_before = "sports_before_A" + str(page_now)
|
||||
id_com[column_before] = before
|
||||
if (after != -1):
|
||||
column_after = "sports_after_A" + str(page_now-1)
|
||||
id_com[column_after] = after
|
||||
elif (task_name == "生活水平问题"):
|
||||
if (page != -1 and pagetime != -1):
|
||||
column_time = "life_Time_A" + str(page) # 停留总时间
|
||||
column_repeat = "life_Repeat_A" + str(page) # 返回停留的次数
|
||||
if (column_time in id_com.keys()):
|
||||
id_com[column_time] = id_com[column_time] + pagetime
|
||||
id_com[column_repeat] = id_com[column_repeat] + 1
|
||||
else:
|
||||
id_com[column_time] = pagetime
|
||||
id_com[column_repeat] = 0
|
||||
if (page_now != -1):
|
||||
column_revise = "life_Revise_A" + str(page_now) # 修改总次数
|
||||
if (column_revise in id_com.keys()):
|
||||
id_com[column_revise] = id_com[column_revise] + edit
|
||||
else:
|
||||
id_com[column_revise] = 0
|
||||
column_AR = "life_AR_A" + str(page_now) # 完成后返回修改总次数
|
||||
if (column_AR in id_com.keys()):
|
||||
id_com[column_AR] = id_com[column_AR] + AR
|
||||
else:
|
||||
id_com[column_AR] = 0
|
||||
if (before != -1):
|
||||
column_before = "life_before_A" + str(page_now)
|
||||
id_com[column_before] = before
|
||||
if (after != -1):
|
||||
column_after = "life_after_A" + str(page_now-1)
|
||||
id_com[column_after] = after
|
||||
print(id_com)
|
||||
id_dict[id] = id_com
|
||||
|
||||
|
||||
data_df = pd.DataFrame(id_dict).T
|
||||
data_df = data_df.fillna(-1) # 空缺位置填充-1
|
||||
data_df.to_excel("A_demo_statre.xlsx")
|
|
@ -1,367 +0,0 @@
|
|||
# ==========================================================================
|
||||
# 处理过程性数据获取学生特征值 - step1
|
||||
# 本脚本以采集到的【已排序的】学生作答过程性作为待处理的原始数据文档,对其进行预处理
|
||||
# 预处理完成后,新的文档会在原始数据文档的基础上新增6列,作为下一步的处理对象
|
||||
#
|
||||
# 新增信息:
|
||||
# page:学生可能会连续在同一页面上执行一系列动作,当结束在同一页面上的操作并切换到另
|
||||
# 一页面后,会在切换后的第一个页面的动作行的page列标注刚刚离开的页面页数,
|
||||
# 如果停留时间(见pagetime)小于1秒则不标注
|
||||
# pagetime:同上,会在pagetime列标注在刚刚离开的页面上的停留时长,精确到1秒,如果
|
||||
# 停留时间小于1秒则不标注
|
||||
# edit:如果学生执行“编辑”操作则标注为1,否则为0;“编辑”定义为修改选择题的答案或
|
||||
# 填空题的一个连续删除序列的结束(连续删除序列后增加字符、或者切换到下一题)
|
||||
# before:学生从拿到一个新页面到在页面上首次作答之间的时间差,在每页的第一次修改行
|
||||
# 标注该行和上一行的时间差,精确到0.1秒
|
||||
# after:学生完成一个页面的作答到切换到下一页之间的时间差,在每页的第一次访问行标注
|
||||
# 该行和上一行的时间差,精确到0.1秒
|
||||
# AR:学生完成一页题目后,已经进入了下一页面、又返回修改已完成的页面。如果该编辑动作
|
||||
# 属于这种情况,则标注为1,否则为0
|
||||
# ==========================================================================
|
||||
# Attention!!!
|
||||
# 由于判断某行是否为一个修改行,需要将它和前后行进行对比,即参考3行-->计算得出中间1行
|
||||
# 因此在遍历每行时,计算的均为上一行的edit、before、after、AR值,最终结果需将这四
|
||||
# 个列表向前循环一位
|
||||
# ==========================================================================
|
||||
from operator import le
|
||||
from tracemalloc import stop
|
||||
import pandas as pd
|
||||
from datetime import date, datetime
|
||||
import json
|
||||
|
||||
def caltime(date1, date2):
|
||||
""" 计算两个datatime类型时间点之间的时间长度,长度以秒为单位
|
||||
:param date1: 开始时间
|
||||
:param date2: 结束时间
|
||||
|
||||
"""
|
||||
if (date1 == '' or date2 == ''):
|
||||
return ""
|
||||
datetimeFormat = '%Y-%m-%dT%H:%M:%S.%f+08:00' # 可能存在的两种时间格式,均需要进行处理
|
||||
datetimeFormat2 = '%Y-%m-%dT%H:%M:%S+08:00'
|
||||
try:
|
||||
d1 = datetime.strptime(str(date1), datetimeFormat)
|
||||
except ValueError:
|
||||
d1 = datetime.strptime(str(date1), datetimeFormat2)
|
||||
try:
|
||||
d2 = datetime.strptime(str(date2), datetimeFormat)
|
||||
except ValueError:
|
||||
d2 = datetime.strptime(str(date2), datetimeFormat2)
|
||||
delta = d2 - d1 # 两个datetime类相减,得到一个deltatime类的对象
|
||||
miao = delta.seconds
|
||||
if (miao == 0):
|
||||
miao = ""
|
||||
return miao
|
||||
|
||||
data = pd.read_excel(r'A_demo.xlsx') # 输入文档,要求是完整的已排序的过程性数据
|
||||
|
||||
# 以下为定义的全局变量
|
||||
old_id = "" # 上一行的id
|
||||
old_task_name = "" # 上一行的任务名称
|
||||
oldold_answer_dict = {} # 上上一行、其打开页面上的作答结果
|
||||
old_answer_dict = {} # 上一行、其打开页面上的作答结果
|
||||
oldold_page = 0 # 上上一行、其打开的页面页数
|
||||
old_page = 0 # 上一行、其打开的页面页数
|
||||
oldold_time = "" # 上上一行、其动作的时间戳
|
||||
old_time = "" # 上一行、其动作的时间戳
|
||||
starttime = "" # 这两个time值用于保存计算页面访问时间的值
|
||||
stoptime = ""
|
||||
max_page = 0 # 由于计算AR列需要判断当前行是否为一个返回已完成页面的动作,即当前行的页面号是否小于已访问过的最大序号
|
||||
before_tag = [] # 由于计算before列需要获得每页的第一次修改行,因此用该变量标注当前学生已修改过的页面页数
|
||||
|
||||
# 以下为最终写入新表格的新列数据
|
||||
page_list = []
|
||||
page_time_list = []
|
||||
edit_list = []
|
||||
AR_list = []
|
||||
before_list = []
|
||||
after_list = []
|
||||
|
||||
|
||||
|
||||
for index, row in data.iterrows():
|
||||
# 开始按行遍历表格
|
||||
print(index)
|
||||
# 提取每行的重要信息
|
||||
id = str(row['ticket_id'])
|
||||
task_id = str(row["task_id"])
|
||||
task_name = str(row["task_name"])
|
||||
timestamp = str(row["timestamp"])
|
||||
answer = json.loads(row['task_answer'])
|
||||
frame = answer["frame"]
|
||||
answer_dict = {}
|
||||
if (frame != None):
|
||||
# 若该行中能提取出作答信息,则提取作答结果数据,反之该行可能是用于分隔不同学生的null列
|
||||
dataa = frame["data"]
|
||||
page = int(dataa["page"])
|
||||
# 以下代码段用于更新计算每页停留时间的starttime变量和stoptime变量,并写入page和pagetime列
|
||||
if ((old_id != id) or (old_task_name != task_name)):
|
||||
print("###")
|
||||
starttime = timestamp
|
||||
page_list.append("")
|
||||
page_time_list.append("")
|
||||
elif (page != 1 and old_page != page):
|
||||
print("&&&")
|
||||
stoptime = timestamp
|
||||
delta = caltime(starttime, stoptime)
|
||||
starttime = timestamp
|
||||
page_list.append(old_page)
|
||||
page_time_list.append(delta)
|
||||
else:
|
||||
print("@@@")
|
||||
page_list.append("")
|
||||
page_time_list.append("")
|
||||
# 以下代码段处理具体的学生作答结果,写入edit、before、AR列
|
||||
answer = dataa["answer"]
|
||||
answer_list = list(answer)
|
||||
### 以下代码段从学生的全部作答中,提取当前操作页面的作答结果,保存入全局变量answer_dict
|
||||
if (task_name == "热身题【本题不计入总分】"):
|
||||
# 热身题
|
||||
pass
|
||||
elif (task_name == "运动会问题"):
|
||||
# 运动会问题
|
||||
if (page == 1):
|
||||
answer_dict['P3'] = answer_list[0]
|
||||
elif (page == 2):
|
||||
answer_dict['MM60311'] = answer_list[1]
|
||||
elif (page == 3):
|
||||
answer_dict['MM60321'] = answer_list[2]
|
||||
elif (page == 4):
|
||||
answer_dict['MM60331'] = answer_list[3]
|
||||
elif (page == 5):
|
||||
answer_dict['MM60341_wang'] = answer_list[4]
|
||||
answer_dict['MM60341_ming'] = answer_list[5]
|
||||
answer_dict['MM60341_zhang'] = answer_list[6]
|
||||
answer_dict['MM60341_li'] = answer_list[7]
|
||||
answer_dict['MM60341_hua'] = answer_list[8]
|
||||
elif (page == 6):
|
||||
answer_dict['MM60351'] = answer_list[9]
|
||||
elif (task_name == "生活水平问题"):
|
||||
# 生活水平问题
|
||||
if (page == 1):
|
||||
answer_dict['P4'] = answer_list[0]
|
||||
elif (page == 2):
|
||||
answer_dict['MM60411'] = answer_list[1]
|
||||
elif (page == 3):
|
||||
answer_dict['MM60421'] = answer_list[2]
|
||||
elif (page == 4):
|
||||
answer_dict['MM60431'] = answer_list[3]
|
||||
answer_dict['MM60432'] = answer_list[4]
|
||||
elif (page == 5):
|
||||
answer_dict['MM60441'] = answer_list[5]
|
||||
answer_dict['MM60442'] = answer_list[6]
|
||||
elif (page == 6):
|
||||
answer_dict['MM60451_1'] = answer_list[7]
|
||||
answer_dict['MM60451_2'] = answer_list[8]
|
||||
elif (page == 7):
|
||||
answer_dict['MM60461_1'] = answer_list[9]
|
||||
answer_dict['MM60461_2'] = answer_list[10]
|
||||
answer_dict['MM60461_3'] = answer_list[11]
|
||||
answer_dict['MM60461_4'] = answer_list[12]
|
||||
# 以下处理判断【当前处理行的【上一行】】是否为修改动作,是否为edit,判断逻辑为:
|
||||
# ========================================================================
|
||||
# 从当前的page要向前看两页old_page和oldold_page,判断old_page页为修改页的条件:
|
||||
# old_page相对oldold,页数不变
|
||||
# old_page为选择题页
|
||||
# 要求出现的-1数目没有改变
|
||||
# old_page为填空题页
|
||||
# old_page相对oldold,字符串长度缩短且
|
||||
# page相对old_page,页数不变且字符串长度增加
|
||||
# page相对old_page,页数改变
|
||||
# ========================================================================
|
||||
# NOTES:由于具体每页的题目数量和类型均不同,因此难以整合出统一的处理方法(函数),
|
||||
# 目前直接对每题、每页进行单独判断并处理,后期可优化
|
||||
if (old_page != oldold_page):
|
||||
edit_list.append("0")
|
||||
before_list.append("")
|
||||
if (max_page < old_page):
|
||||
time_tmp = caltime(oldold_time, old_time)
|
||||
if (time_tmp == ""):
|
||||
time_tmp = 0
|
||||
after_list.append(time_tmp)
|
||||
else:
|
||||
after_list.append("")
|
||||
else:
|
||||
print(before_tag)
|
||||
after_list.append("")
|
||||
if (old_page not in before_tag):
|
||||
before_tag.append(old_page)
|
||||
time_tmp = caltime(oldold_time, old_time)
|
||||
if (time_tmp == ""):
|
||||
time_tmp = 0
|
||||
before_list.append(time_tmp)
|
||||
else:
|
||||
before_list.append("")
|
||||
if (old_task_name == "运动会问题"):
|
||||
if (old_page in [1, 2, 3, 6]):
|
||||
old_answer = str(list(old_answer_dict.values()))
|
||||
oldold_answer = str(list(oldold_answer_dict.values()))
|
||||
if (old_answer.count("-1") == oldold_answer.count("-1")):
|
||||
edit_list.append("1")
|
||||
else:
|
||||
edit_list.append("0")
|
||||
elif (old_page == 4):
|
||||
if (len(old_answer_dict["MM60331"]) < len(oldold_answer_dict["MM60331"])):
|
||||
if (page != old_page):
|
||||
edit_list.append("1")
|
||||
elif (page == old_page and len(old_answer_dict["MM60331"]) < len(answer_dict["MM60331"])):
|
||||
edit_list.append("1")
|
||||
else:
|
||||
edit_list.append("0")
|
||||
else:
|
||||
edit_list.append("0")
|
||||
elif (old_page == 5):
|
||||
old_answer = old_answer_dict["MM60341_wang"] + old_answer_dict["MM60341_ming"] + old_answer_dict["MM60341_zhang"] + old_answer_dict["MM60341_li"] + old_answer_dict["MM60341_hua"]
|
||||
oldold_answer = oldold_answer_dict["MM60341_wang"] + oldold_answer_dict["MM60341_ming"] + oldold_answer_dict["MM60341_zhang"] + oldold_answer_dict["MM60341_li"] + oldold_answer_dict["MM60341_hua"]
|
||||
if (len(old_answer) < len(oldold_answer)):
|
||||
if (page != old_page):
|
||||
edit_list.append("1")
|
||||
elif (page == old_page):
|
||||
now_answer = answer_dict["MM60341_wang"] + answer_dict["MM60341_ming"] + answer_dict["MM60341_zhang"] + answer_dict["MM60341_li"] + answer_dict["MM60341_hua"]
|
||||
if (len(old_answer) < len(now_answer)):
|
||||
edit_list.append("1")
|
||||
else:
|
||||
edit_list.append("0")
|
||||
else:
|
||||
edit_list.append("0")
|
||||
else:
|
||||
edit_list.append("0")
|
||||
else:
|
||||
edit_list.append("0")
|
||||
elif (old_task_name == "生活水平问题"):
|
||||
if (old_page in [1, 2, 3, 7]):
|
||||
old_answer = str(list(old_answer_dict.values()))
|
||||
oldold_answer = str(list(oldold_answer_dict.values()))
|
||||
if (old_answer.count("-1") == oldold_answer.count("-1")):
|
||||
edit_list.append("1")
|
||||
else:
|
||||
edit_list.append("0")
|
||||
elif (old_page == 4):
|
||||
old_answer = old_answer_dict["MM60431"] + old_answer_dict["MM60432"]
|
||||
oldold_answer = oldold_answer_dict["MM60431"] + oldold_answer_dict["MM60432"]
|
||||
if (len(old_answer) < len(oldold_answer)):
|
||||
if (page != old_page):
|
||||
edit_list.append("1")
|
||||
elif (page == old_page):
|
||||
now_answer = answer_dict["MM60431"] + answer_dict["MM60432"]
|
||||
if (len(old_answer) < len(now_answer)):
|
||||
edit_list.append("1")
|
||||
else:
|
||||
edit_list.append("0")
|
||||
else:
|
||||
edit_list.append("0")
|
||||
else:
|
||||
edit_list.append("0")
|
||||
elif (old_page == 5):
|
||||
old_answer = old_answer_dict["MM60441"] + old_answer_dict["MM60442"]
|
||||
oldold_answer = oldold_answer_dict["MM60441"] + oldold_answer_dict["MM60442"]
|
||||
if (len(old_answer) < len(oldold_answer)):
|
||||
if (page != old_page):
|
||||
edit_list.append("1")
|
||||
elif (page == old_page):
|
||||
now_answer = answer_dict["MM60441"] + answer_dict["MM60442"]
|
||||
if (len(old_answer) < len(now_answer)):
|
||||
edit_list.append("1")
|
||||
else:
|
||||
edit_list.append("0")
|
||||
else:
|
||||
edit_list.append("0")
|
||||
else:
|
||||
edit_list.append("0")
|
||||
elif (old_page == 6):
|
||||
old_answer = old_answer_dict["MM60451_1"] + old_answer_dict["MM60451_2"]
|
||||
oldold_answer = oldold_answer_dict["MM60451_1"] + oldold_answer_dict["MM60451_2"]
|
||||
if (len(old_answer) < len(oldold_answer)):
|
||||
if (page != old_page):
|
||||
edit_list.append("1")
|
||||
elif (page == old_page):
|
||||
now_answer = answer_dict["MM60451_1"] + answer_dict["MM60451_2"]
|
||||
if (len(old_answer) < len(now_answer)):
|
||||
edit_list.append("1")
|
||||
else:
|
||||
edit_list.append("0")
|
||||
else:
|
||||
edit_list.append("0")
|
||||
else:
|
||||
edit_list.append("0")
|
||||
else:
|
||||
edit_list.append("0")
|
||||
else:
|
||||
edit_list.append("0")
|
||||
edit = int(edit_list[-1])
|
||||
# 以下代码:写入AR列,满足AR列的前提条件是一个edit列
|
||||
# 满足AR列的条件:(edit==1) && (该行的页面序号小于最大已访问页面序号)
|
||||
if (max_page < old_page):
|
||||
max_page = old_page
|
||||
if (edit == 1 and old_page < max_page):
|
||||
AR_list.append("1")
|
||||
else:
|
||||
AR_list.append("0")
|
||||
# 以下代码:常规动作,更新全局变量
|
||||
oldold_page = old_page
|
||||
old_page = page
|
||||
oldold_time = old_time
|
||||
old_time = timestamp
|
||||
else:
|
||||
# 若该行是无内容null页,用于分隔不同的学生作答
|
||||
# 以下代码:此时可以重新初始化各个每个学生特有的变量
|
||||
page = 0
|
||||
before_tag = []
|
||||
max_page = 0
|
||||
# 以下代码:如果改变了用户id或task_name,则当前时间设为starttime,即第一道题的开始时间
|
||||
if ((old_id != id) or (old_task_name != task_name)):
|
||||
print("###")
|
||||
starttime = timestamp
|
||||
# 以下代码:向写入新列中填充合适的空值
|
||||
page_list.append("")
|
||||
page_time_list.append("")
|
||||
edit_list.append("0")
|
||||
AR_list.append("0")
|
||||
before_list.append("")
|
||||
# 以下代码:但针对after列,切换了用户则必然代表切换到新页,因此向after列中填充合适的空值
|
||||
time_tmp = caltime(oldold_time, old_time)
|
||||
if (time_tmp == ""):
|
||||
time_tmp = 0
|
||||
after_list.append(time_tmp)
|
||||
# 以下代码:常规动作,更新全局变量
|
||||
oldold_time = old_time
|
||||
old_time = timestamp
|
||||
oldold_page = old_page
|
||||
old_page = page
|
||||
# 以下代码:常规动作,更新全局变量
|
||||
old_id = id
|
||||
old_task_name = task_name
|
||||
oldold_answer_dict = old_answer_dict
|
||||
old_answer_dict = answer_dict
|
||||
|
||||
# 以下代码:将四个列表向前循环一位
|
||||
x = edit_list.pop(0)
|
||||
edit_list.append(x)
|
||||
|
||||
x = AR_list.pop(0)
|
||||
AR_list.append(x)
|
||||
|
||||
x = before_list.pop(0)
|
||||
before_list.append(x)
|
||||
|
||||
x = after_list.pop(0)
|
||||
after_list.append(x)
|
||||
|
||||
# 以下代码:将6个新列表写入原始pandas数据
|
||||
col_name = data.columns.tolist()
|
||||
|
||||
col_name.insert(col_name.index('task_answer')+1, 'AR')
|
||||
col_name.insert(col_name.index('task_answer')+1, 'after')
|
||||
col_name.insert(col_name.index('task_answer')+1, 'before')
|
||||
col_name.insert(col_name.index('task_answer')+1, 'edit')
|
||||
col_name.insert(col_name.index('task_answer')+1, 'pagetime')
|
||||
col_name.insert(col_name.index('task_answer')+1, 'page')
|
||||
data = data.reindex(columns=col_name)
|
||||
data['AR'] = AR_list
|
||||
data['after'] = after_list
|
||||
data['before'] = before_list
|
||||
data['edit'] = edit_list
|
||||
data['pagetime'] = page_time_list
|
||||
data['page'] = page_list
|
||||
|
||||
data.to_excel('A_demo_out.xlsx') # 数据导出
|
|
@ -1,33 +0,0 @@
|
|||
import pandas as pd
|
||||
|
||||
|
||||
data1 = pd.read_excel(r'ticket_log_PBL_testing3.xlsx')
|
||||
A1 = data1[data1['contest_id'].str.contains('高阶能力测试B|高阶能力测试C')]
|
||||
Z1 = data1[data1['contest_id'].str.contains('数学建模')]
|
||||
A1.to_excel('A1.xlsx')
|
||||
Z1.to_excel('Z1.xlsx')
|
||||
|
||||
data2 = pd.read_excel(r'ticket_log_PBL_testing3.xlsx', sheet_name='Result 2')
|
||||
A2 = data2[data2['contest_id'].str.contains('高阶能力测试B|高阶能力测试C')]
|
||||
Z2 = data2[data2['contest_id'].str.contains('数学建模')]
|
||||
A2.to_excel('A2.xlsx')
|
||||
Z2.to_excel('Z2.xlsx')
|
||||
|
||||
|
||||
data3 = pd.read_excel(r'ticket_log_PBL_testing3.xlsx', sheet_name='Result 3')
|
||||
A3 = data3[data3['contest_id'].str.contains('高阶能力测试B|高阶能力测试C')]
|
||||
Z3 = data3[data3['contest_id'].str.contains('数学建模')]
|
||||
A3.to_excel('A3.xlsx')
|
||||
Z3.to_excel('Z3.xlsx')
|
||||
|
||||
data4 = pd.read_excel(r'ticket_log_PBL_testing3.xlsx', sheet_name='Result 4')
|
||||
A4 = data4[data4['contest_id'].str.contains('高阶能力测试B|高阶能力测试C')]
|
||||
Z4 = data4[data4['contest_id'].str.contains('数学建模')]
|
||||
A4.to_excel('A4.xlsx')
|
||||
Z4.to_excel('Z4.xlsx')
|
||||
|
||||
A = pd.concat([A1, A2, A3, A4])
|
||||
Z = pd.concat([Z1, Z2, Z3, Z4])
|
||||
|
||||
A.to_excel('A.xlsx')
|
||||
Z.to_excel('Z.xlsx')
|
|
@ -1,60 +0,0 @@
|
|||
import pandas as pd
|
||||
|
||||
data_a = pd.read_excel(r'a_out_0719.xlsx')
|
||||
data_z = pd.read_excel(r'z_out_0719.xlsx')
|
||||
|
||||
level0to3 = 0
|
||||
level3to6 = 0
|
||||
level6to9 = 0
|
||||
level9to12 = 0
|
||||
level12to15 = 0
|
||||
levelabove15 = 0
|
||||
|
||||
level0to5 = 0
|
||||
level5to10 = 0
|
||||
level10to15 = 0
|
||||
level15to20 = 0
|
||||
level20to25 = 0
|
||||
level25to30 = 0
|
||||
levelabove30 = 0
|
||||
|
||||
print("A")
|
||||
for index, row in data_a.iterrows():
|
||||
if (pd.isna(row['time_new'])):
|
||||
continue
|
||||
fen = float(row['time_new'])
|
||||
if (fen <= 5):
|
||||
level0to5 = level0to5 + 1
|
||||
elif (fen > 5 and fen <= 10):
|
||||
level5to10 = level5to10 + 1
|
||||
elif (fen > 10 and fen <= 15):
|
||||
level10to15 = level10to15 + 1
|
||||
elif (fen > 15 and fen <= 20):
|
||||
level15to20 = level15to20 + 1
|
||||
elif (fen > 20 and fen <= 25):
|
||||
level20to25 = level20to25 + 1
|
||||
elif (fen > 25 and fen <= 30):
|
||||
level25to30 = level25to30 + 1
|
||||
else:
|
||||
levelabove30 = levelabove30 + 1
|
||||
print(level0to5, level5to10, level10to15, level15to20, level20to25, level25to30, levelabove30)
|
||||
|
||||
print("z")
|
||||
for index, row in data_z.iterrows():
|
||||
if (pd.isna(row['time_new'])):
|
||||
continue
|
||||
fen = float(row['time_new'])
|
||||
if (fen <= 3):
|
||||
level0to3 = level0to3 + 1
|
||||
elif (fen > 3 and fen <= 6):
|
||||
level3to6 = level3to6 + 1
|
||||
elif (fen > 6 and fen <= 9):
|
||||
level6to9 = level6to9 + 1
|
||||
elif (fen > 9 and fen <= 12):
|
||||
level9to12 = level9to12 + 1
|
||||
elif (fen > 12 and fen <= 15):
|
||||
level12to15 = level12to15 + 1
|
||||
else:
|
||||
levelabove15 = levelabove15 + 1
|
||||
|
||||
print(level0to3, level3to6, level6to9, level9to12, level12to15,levelabove15)
|
|
@ -1,11 +0,0 @@
|
|||
#### 结果性数据处理
|
||||
|
||||
该目录下的程序可以根据评分编码细则,获得对学生作答结果的自动化编码方法和编码结果,统计学生作答时长。
|
||||
|
||||
- getscore.py - 提供计算每个小题编码的函数
|
||||
- calc_a.py - 计算A卷编码
|
||||
- calc_z.py - 计算Z卷编码
|
||||
- calc_time_a.py - 统计A卷作答时长
|
||||
- calc_time_z.py - 统计Z卷作答时长
|
||||
|
||||
由于题目和数据的保密性质,此处不便给出编码细则和处理的结果性数据
|
|
@ -1,169 +0,0 @@
|
|||
import pandas as pd
|
||||
import getscore as gs
|
||||
|
||||
data = pd.read_excel(r'out_0712.xlsx')
|
||||
|
||||
P3_codelist = []
|
||||
mm60311_CODElist = []
|
||||
mm60321_CODElist = []
|
||||
mm60341_CODElist = []
|
||||
xiaowang = []
|
||||
xiaowangcompare = []
|
||||
xiaoming = []
|
||||
xiaozhang = []
|
||||
xiaoli = []
|
||||
xiaohua = []
|
||||
mm60351_CODElist = []
|
||||
P4_codelist = []
|
||||
mm60411_CODElist = []
|
||||
mm60421_CODElist = []
|
||||
mm60441_CODElist = []
|
||||
mm60442_CODElist = []
|
||||
shouru = []
|
||||
jiage = []
|
||||
mm60461_CODElist = []
|
||||
|
||||
for index, row in data.iterrows():
|
||||
P3 = row['P3']
|
||||
P3_CODE = gs.compareP3(P3)
|
||||
P3_codelist.append(P3_CODE)
|
||||
mm60311 = row['MM60311'] # 依赖mm60331
|
||||
mm60331_CODE = row['MM60331_CODE']
|
||||
mm60311_CODE = gs.get60311(mm60311, mm60331_CODE)
|
||||
mm60311_CODElist.append(mm60311_CODE)
|
||||
|
||||
mm60321 = row['MM60321']
|
||||
mm60321_CODE = gs.get60321(mm60321)
|
||||
mm60321_CODElist.append(mm60321_CODE)
|
||||
|
||||
mm60331 = row['MM60331_new']
|
||||
mm60331_formula = gs.get331(mm60331)
|
||||
wang = gs.cal331(mm60331, mm60331_formula, 2, 2, 1, 9.1, 7.15, 1.61, 1, 2, 1.61, 9.1)
|
||||
xiaowang.append(wang)
|
||||
ming = gs.cal331(mm60331, mm60331_formula, 4, 1, 3, 9.8, 7.82, 1.54, 1, 4, 7.82, 9.8)
|
||||
xiaoming.append(ming)
|
||||
zhang = gs.cal331(mm60331, mm60331_formula, 3, 4, 5, 9.3, 6.54, 1.47, 3, 5, 9.3, 1.47)
|
||||
xiaozhang.append(zhang)
|
||||
li = gs.cal331(mm60331, mm60331_formula, 5, 5, 4, 10.1, 6.32, 1.51, 4, 5, 1.51, 10.1)
|
||||
xiaoli.append(li)
|
||||
hua = gs.cal331(mm60331, mm60331_formula, 1, 3, 2, 8.5, 6.93, 1.58, 1, 3, 8.5, 6.93)
|
||||
xiaohua.append(hua)
|
||||
|
||||
mm60341_An = [row['MM60341'], row['MM60342'], row['MM60343'], row['MM60344'], row['MM60345']]
|
||||
mm60341_List = [wang, ming, zhang, li, hua]
|
||||
mm60341_CODE = gs.get60341(mm60341_An, mm60341_List, mm60331)
|
||||
mm60341_CODElist.append(mm60341_CODE)
|
||||
|
||||
mm60351 = row['MM60351'] # 依赖mm60331和mm60341
|
||||
mm60351_CODE = gs.get60351(mm60351, mm60331_CODE, [wang, ming, zhang, li, hua])
|
||||
mm60351_CODElist.append(mm60351_CODE)
|
||||
|
||||
|
||||
P4 = row['P4']
|
||||
P4_CODE = gs.compareP3(P4)
|
||||
P4_codelist.append(P4_CODE)
|
||||
|
||||
mm60411 = row['MM60411'] #
|
||||
mm60411_CODE = gs.get60411(mm60411)
|
||||
mm60411_CODElist.append(mm60411_CODE)
|
||||
|
||||
mm60421 = row['MM60421'] #
|
||||
mm60421_CODE = gs.get60421(mm60421)
|
||||
mm60421_CODElist.append(mm60421_CODE)
|
||||
|
||||
mm60431 = row['MM60431_new']
|
||||
mm60432 = row['MM60432_new']
|
||||
mm60441 = row['MM60441'] # 和计算mm60431算式比较
|
||||
shouru_right = gs.cal431(mm60431, 90000, 50000, 10, 8, 0.55, 0.5)
|
||||
shouru.append(shouru_right)
|
||||
mm60441_CODE = gs.compareFor(mm60441, shouru_right, mm60431)
|
||||
mm60441_CODElist.append(mm60441_CODE)
|
||||
|
||||
mm60442 = row['MM60442'] # 和计算mm60432算式比较
|
||||
jiage_right = gs.cal431(mm60432, 90000, 50000, 10, 8, 0.55, 0.5)
|
||||
jiage.append(jiage_right)
|
||||
mm60442_CODE = gs.compareFor(mm60442, jiage_right, mm60432)
|
||||
mm60442_CODElist.append(mm60442_CODE)
|
||||
|
||||
mm60461 = row['MM60461'] # mm60461的答案取决于461-464
|
||||
mm60462 = row['MM60462']
|
||||
mm60463 = row['MM60463']
|
||||
mm60464 = row['MM60464']
|
||||
mm60461_CODE = gs.get60461([mm60461, mm60462, mm60463, mm60464])
|
||||
mm60461_CODElist.append(mm60461_CODE)
|
||||
|
||||
col_name = data.columns.tolist()
|
||||
|
||||
col_name.insert(col_name.index('P3')+1, 'P3_CODE')
|
||||
data = data.reindex(columns=col_name)
|
||||
data['P3_CODE'] = P3_codelist
|
||||
|
||||
col_name.insert(col_name.index('MM60311')+1, 'MM60311_CODE')
|
||||
data = data.reindex(columns=col_name)
|
||||
data['MM60311_CODE'] = mm60311_CODElist
|
||||
|
||||
col_name.insert(col_name.index('MM60321')+1, 'MM60321_CODE')
|
||||
data = data.reindex(columns=col_name)
|
||||
data['MM60321_CODE'] = mm60321_CODElist
|
||||
|
||||
col_name.insert(col_name.index('MM60341')+1, 'wang')
|
||||
data = data.reindex(columns=col_name)
|
||||
data['wang'] = xiaowang
|
||||
|
||||
col_name.insert(col_name.index('MM60342')+1, 'ming')
|
||||
data = data.reindex(columns=col_name)
|
||||
data['ming'] = xiaoming
|
||||
|
||||
col_name.insert(col_name.index('MM60343')+1, 'zhang')
|
||||
data = data.reindex(columns=col_name)
|
||||
data['zhang'] = xiaozhang
|
||||
|
||||
col_name.insert(col_name.index('MM60344')+1, 'li')
|
||||
data = data.reindex(columns=col_name)
|
||||
data['li'] = xiaoli
|
||||
|
||||
col_name.insert(col_name.index('MM60345')+1, 'hua')
|
||||
data = data.reindex(columns=col_name)
|
||||
data['hua'] = xiaohua
|
||||
|
||||
col_name.insert(col_name.index('MM60341'), 'MM60341_CODE')
|
||||
data = data.reindex(columns=col_name)
|
||||
data['MM60341_CODE'] = mm60341_CODElist
|
||||
|
||||
col_name.insert(col_name.index('MM60351')+1, 'MM60351_CODE')
|
||||
data = data.reindex(columns=col_name)
|
||||
data['MM60351_CODE'] = mm60351_CODElist
|
||||
|
||||
col_name.insert(col_name.index('MM60431')+1, 'shouru')
|
||||
data = data.reindex(columns=col_name)
|
||||
data['shouru'] = shouru
|
||||
|
||||
col_name.insert(col_name.index('MM60432')+1, 'jiage')
|
||||
data = data.reindex(columns=col_name)
|
||||
data['jiage'] = jiage
|
||||
|
||||
col_name.insert(col_name.index('P4')+1, 'P4_CODE')
|
||||
data = data.reindex(columns=col_name)
|
||||
data['P4_CODE'] = P4_codelist
|
||||
|
||||
col_name.insert(col_name.index('MM60411')+1, 'MM60411_CODE')
|
||||
data = data.reindex(columns=col_name)
|
||||
data['MM60411_CODE'] = mm60411_CODElist
|
||||
|
||||
col_name.insert(col_name.index('MM60421')+1, 'MM60421_CODE')
|
||||
data = data.reindex(columns=col_name)
|
||||
data['MM60421_CODE'] = mm60421_CODElist
|
||||
|
||||
col_name.insert(col_name.index('MM60441')+1, 'MM60441_CODE')
|
||||
data = data.reindex(columns=col_name)
|
||||
data['MM60441_CODE'] = mm60441_CODElist
|
||||
|
||||
col_name.insert(col_name.index('MM60442')+1, 'MM60442_CODE')
|
||||
data = data.reindex(columns=col_name)
|
||||
data['MM60442_CODE'] = mm60442_CODElist
|
||||
|
||||
col_name.insert(col_name.index('MM60461')+1, 'MM60461_CODE')
|
||||
data = data.reindex(columns=col_name)
|
||||
data['MM60461_CODE'] = mm60461_CODElist
|
||||
|
||||
data.to_excel('a_out_0714.xlsx')
|
|
@ -1,72 +0,0 @@
|
|||
import pandas as pd
|
||||
from datetime import datetime
|
||||
|
||||
|
||||
data = pd.read_excel(r'a_out_0714.xlsx')
|
||||
|
||||
time_new_list = []
|
||||
empty = 0
|
||||
level0to5 = 0
|
||||
level5to10 = 0
|
||||
level10to15 = 0
|
||||
level15to20 = 0
|
||||
level20to25 = 0
|
||||
level25to30 = 0
|
||||
levelabove30 = 0
|
||||
|
||||
datetimeFormat = '%Y-%m-%dT%H:%M:%S.%f+08:00'
|
||||
datetimeFormat2 = '%Y-%m-%dT%H:%M:%S+08:00'
|
||||
|
||||
for index, row in data.iterrows():
|
||||
P3_CODE = row['P3_CODE']
|
||||
MM60311_CODE = row['MM60311_CODE']
|
||||
if (pd.isna(row['stop_time']) and (int(P3_CODE) != 99 or int(MM60311_CODE) != 99)):
|
||||
#print(P3_CODE == '99')
|
||||
print(P3_CODE, MM60311_CODE, index)
|
||||
empty = empty + 1
|
||||
time_new_list.append("")
|
||||
continue
|
||||
if (P3_CODE == '99' or pd.isna(row['start_time']) or pd.isna(row['stop_time'])):
|
||||
time_new_list.append("")
|
||||
continue
|
||||
# print(index)
|
||||
time_ori = float(row['cost_time'])
|
||||
try:
|
||||
date1 = datetime.strptime(str(row['start_time']), datetimeFormat)
|
||||
except ValueError:
|
||||
date1 = datetime.strptime(str(row['start_time']), datetimeFormat2)
|
||||
try:
|
||||
date2 = datetime.strptime(str(row['stop_time']), datetimeFormat)
|
||||
except ValueError:
|
||||
date2 = datetime.strptime(str(row['stop_time']), datetimeFormat2)
|
||||
delta = date2 - date1
|
||||
#print(delta)
|
||||
miao = delta.seconds
|
||||
fen = round(miao/60, 2)
|
||||
#if (fen != time_ori):
|
||||
#print(str(time_ori), str(fen), str(index))
|
||||
if (fen <= 5):
|
||||
level0to5 = level0to5 + 1
|
||||
elif (fen > 5 and fen <= 10):
|
||||
level5to10 = level5to10 + 1
|
||||
elif (fen > 10 and fen <= 15):
|
||||
level10to15 = level10to15 + 1
|
||||
elif (fen > 15 and fen <= 20):
|
||||
level15to20 = level15to20 + 1
|
||||
elif (fen > 20 and fen <= 25):
|
||||
level20to25 = level20to25 + 1
|
||||
elif (fen > 25 and fen <= 30):
|
||||
level25to30 = level25to30 + 1
|
||||
else:
|
||||
levelabove30 = levelabove30 + 1
|
||||
time_new_list.append(fen)
|
||||
|
||||
col_name = data.columns.tolist()
|
||||
|
||||
col_name.insert(col_name.index('cost_time')+1, 'time_new')
|
||||
data = data.reindex(columns=col_name)
|
||||
data['time_new'] = time_new_list
|
||||
|
||||
print(empty, level0to5, level5to10, level10to15, level15to20, level20to25, level25to30, levelabove30)
|
||||
|
||||
data.to_excel('a_out_0715.xlsx')
|
|
@ -1,93 +0,0 @@
|
|||
import pandas as pd
|
||||
from datetime import datetime
|
||||
|
||||
|
||||
data = pd.read_excel(r'z_out_0715.xlsx')
|
||||
|
||||
time_new_list = []
|
||||
empty = 0
|
||||
level0to3 = 0
|
||||
level3to6 = 0
|
||||
level6to9 = 0
|
||||
level9to12 = 0
|
||||
level12to15 = 0
|
||||
levelabove15 = 0
|
||||
|
||||
level0to5 = 0
|
||||
level5to10 = 0
|
||||
level10to15 = 0
|
||||
level15to20 = 0
|
||||
level20to25 = 0
|
||||
level25to30 = 0
|
||||
levelabove30 = 0
|
||||
|
||||
datetimeFormat = '%Y-%m-%dT%H:%M:%S.%f+08:00'
|
||||
datetimeFormat2 = '%Y-%m-%dT%H:%M:%S+08:00'
|
||||
|
||||
for index, row in data.iterrows():
|
||||
P1_CODE = row['P1_CODE']
|
||||
MM60101_CODE = row['MM60101_CODE']
|
||||
if (pd.isna(row['stop_time']) and (int(P1_CODE) != 99 or int(MM60101_CODE) != 99)):
|
||||
#print(P3_CODE == '99')
|
||||
print(P1_CODE, MM60101_CODE, index)
|
||||
empty = empty + 1
|
||||
time_new_list.append("")
|
||||
continue
|
||||
if (int(P1_CODE) == 99 or pd.isna(row['start_time']) or pd.isna(row['stop_time'])):
|
||||
time_new_list.append("")
|
||||
continue
|
||||
# print(index)
|
||||
time_ori = float(row['cost_time'])
|
||||
try:
|
||||
date1 = datetime.strptime(str(row['start_time']), datetimeFormat)
|
||||
except ValueError:
|
||||
date1 = datetime.strptime(str(row['start_time']), datetimeFormat2)
|
||||
try:
|
||||
date2 = datetime.strptime(str(row['stop_time']), datetimeFormat)
|
||||
except ValueError:
|
||||
date2 = datetime.strptime(str(row['stop_time']), datetimeFormat2)
|
||||
delta = date2 - date1
|
||||
#print(delta)
|
||||
miao = delta.seconds
|
||||
fen = round(miao/60, 2)
|
||||
#if (fen != time_ori):
|
||||
#print(str(time_ori), str(fen), str(index))
|
||||
if (fen <= 5):
|
||||
level0to5 = level0to5 + 1
|
||||
elif (fen > 5 and fen <= 10):
|
||||
level5to10 = level5to10 + 1
|
||||
elif (fen > 10 and fen <= 15):
|
||||
level10to15 = level10to15 + 1
|
||||
elif (fen > 15 and fen <= 20):
|
||||
level15to20 = level15to20 + 1
|
||||
elif (fen > 20 and fen <= 25):
|
||||
level20to25 = level20to25 + 1
|
||||
elif (fen > 25 and fen <= 30):
|
||||
level25to30 = level25to30 + 1
|
||||
else:
|
||||
levelabove30 = levelabove30 + 1
|
||||
|
||||
if (fen <= 3):
|
||||
level0to3 = level0to3 + 1
|
||||
elif (fen > 3 and fen <= 6):
|
||||
level3to6 = level3to6 + 1
|
||||
elif (fen > 6 and fen <= 9):
|
||||
level6to9 = level6to9 + 1
|
||||
elif (fen > 9 and fen <= 12):
|
||||
level9to12 = level9to12 + 1
|
||||
elif (fen > 12 and fen <= 15):
|
||||
level12to15 = level12to15 + 1
|
||||
else:
|
||||
levelabove15 = levelabove15 + 1
|
||||
time_new_list.append(fen)
|
||||
|
||||
col_name = data.columns.tolist()
|
||||
|
||||
col_name.insert(col_name.index('cost_time')+1, 'time_new')
|
||||
data = data.reindex(columns=col_name)
|
||||
data['time_new'] = time_new_list
|
||||
|
||||
print(empty, level0to3, level3to6, level6to9, level9to12, level12to15,levelabove15)
|
||||
print(empty, level0to5, level5to10, level10to15, level15to20, level20to25, level25to30, levelabove30)
|
||||
|
||||
data.to_excel('z_out_0715_time.xlsx')
|
|
@ -1,94 +0,0 @@
|
|||
import pandas as pd
|
||||
import getscore as gs
|
||||
|
||||
data = pd.read_excel(r'z_out_0713_修改后.xlsx')
|
||||
# manu = pd.read_csv(r'z_manu.csv')
|
||||
code_dict = {}
|
||||
|
||||
p1_CODElist = []
|
||||
mm60101_CODElist = []
|
||||
mm60102_CODElist = []
|
||||
mm60104_CODElist = []
|
||||
mm60105_CODElist = []
|
||||
mm60107_CODElist = []
|
||||
mm60108_CODElist = []
|
||||
|
||||
shui_list = []
|
||||
'''
|
||||
for index, row in manu.iterrows():
|
||||
key = str(row['id'])
|
||||
value = str(row['MM60104'])
|
||||
code_dict[key] = value
|
||||
|
||||
'''
|
||||
for index, row in data.iterrows():
|
||||
# id = row['ticket_id']
|
||||
# mm60104_CODE = code_dict[str(id)]
|
||||
# mm60104_CODElist.append(mm60104_CODE)
|
||||
p1 = row['P1']
|
||||
p1_CODE = gs.compareP3(p1)
|
||||
p1_CODElist.append(p1_CODE)
|
||||
|
||||
mm60101 = row['MM60101']
|
||||
mm60101_CODE = gs.get60101(mm60101)
|
||||
mm60101_CODElist.append(mm60101_CODE)
|
||||
|
||||
mm60102 = row['MM60102']
|
||||
mm60102_CODE = gs.get60102(mm60102)
|
||||
mm60102_CODElist.append(mm60102_CODE)
|
||||
|
||||
mm60104 = row['MM60104_new']
|
||||
mm60104_CODE = row['MM60104_CODE']
|
||||
shui = gs.cal104(mm60104, gs.get104(mm60104), 2, 18, 500, 3, 3, 25, 1, 1, 7, 4)
|
||||
shui_list.append(shui)
|
||||
mm60105 = row['MM60105']
|
||||
mm60105_CODE = gs.compareZ(mm60104_CODE, shui, mm60105)
|
||||
mm60105_CODElist.append(mm60105_CODE)
|
||||
|
||||
mm601071, mm601072 = row['MM601071'], row['MM601072']
|
||||
mm60107_CODE = gs.get60107(mm601071, mm601072, mm60105_CODE)
|
||||
mm60107_CODElist.append(mm60107_CODE)
|
||||
|
||||
mm60108 = [row['MM601081'], row['MM601082'], row['MM601083'], row['MM601084']]
|
||||
mm60108_CODE = gs.get60108(mm60108, mm60102_CODE)
|
||||
mm60108_CODElist.append(mm60108_CODE)
|
||||
|
||||
|
||||
|
||||
col_name = data.columns.tolist()
|
||||
|
||||
col_name.insert(col_name.index('P1')+1, 'P1_CODE')
|
||||
data = data.reindex(columns=col_name)
|
||||
data['P1_CODE'] = p1_CODElist
|
||||
|
||||
col_name.insert(col_name.index('MM60101')+1, 'MM60101_CODE')
|
||||
data = data.reindex(columns=col_name)
|
||||
data['MM60101_CODE'] = mm60101_CODElist
|
||||
|
||||
col_name.insert(col_name.index('MM60102')+1, 'MM60102_CODE')
|
||||
data = data.reindex(columns=col_name)
|
||||
data['MM60102_CODE'] = mm60102_CODElist
|
||||
|
||||
'''
|
||||
col_name.insert(col_name.index('MM60104')+1, 'MM60104_CODE')
|
||||
data = data.reindex(columns=col_name)
|
||||
data['MM60104_CODE'] = mm60104_CODElist
|
||||
'''
|
||||
col_name.insert(col_name.index('MM60105')+1, 'shui')
|
||||
data = data.reindex(columns=col_name)
|
||||
data['shui'] = shui_list
|
||||
|
||||
col_name.insert(col_name.index('MM60105')+2, 'MM60105_CODE')
|
||||
data = data.reindex(columns=col_name)
|
||||
data['MM60105_CODE'] = mm60105_CODElist
|
||||
|
||||
col_name.insert(col_name.index('MM601072')+1, 'MM60107_CODE')
|
||||
data = data.reindex(columns=col_name)
|
||||
data['MM60107_CODE'] = mm60107_CODElist
|
||||
|
||||
col_name.insert(col_name.index('MM601084')+1, 'MM60108_CODE')
|
||||
data = data.reindex(columns=col_name)
|
||||
data['MM60108_CODE'] = mm60108_CODElist
|
||||
|
||||
|
||||
data.to_excel('z_out_0714.xlsx')
|
|
@ -1,433 +0,0 @@
|
|||
|
||||
from decimal import MAX_EMAX
|
||||
from itertools import count
|
||||
from turtle import right
|
||||
from unittest import result
|
||||
import pandas as pd
|
||||
|
||||
file = open("error1.txt", "w")
|
||||
|
||||
def compareUp(right, answer):
|
||||
if (pd.isna(right) or pd.isna(answer) or right=="" or answer==""):
|
||||
return False
|
||||
try:
|
||||
right = float(right)
|
||||
answer = float(answer)
|
||||
except ValueError:
|
||||
return False
|
||||
if (right < 10):
|
||||
right = right * 10
|
||||
answer = answer * 10
|
||||
try:
|
||||
r = [int(right), round(right), int(100*right), round(100*right)]
|
||||
print(r)
|
||||
except OverflowError:
|
||||
return False
|
||||
a1 = int(answer)
|
||||
a2 = round(answer)
|
||||
if (a1 in r):
|
||||
return True
|
||||
if (a2 in r):
|
||||
return True
|
||||
return False
|
||||
|
||||
def compare(right, answer):
|
||||
# 考虑:原答案、原答案四舍五入的结果
|
||||
# 考虑:正确答案、正确答案四舍五入的结果
|
||||
# 仅考虑整数部分、若两者有相同情况则返回true
|
||||
if (pd.isna(right) or pd.isna(answer) or right=="" or answer==""):
|
||||
return False
|
||||
try:
|
||||
right = float(right)
|
||||
answer = float(answer)
|
||||
except ValueError:
|
||||
return False
|
||||
if (right < 10):
|
||||
right = right * 10
|
||||
answer = answer * 10
|
||||
try:
|
||||
r = [int(right), round(right)]
|
||||
except OverflowError:
|
||||
return False
|
||||
a1 = int(answer)
|
||||
a2 = round(answer)
|
||||
if (a1 in r):
|
||||
return True
|
||||
if (a2 in r):
|
||||
return True
|
||||
return False
|
||||
|
||||
def compareZ(code104, right, answer):
|
||||
if (pd.isna(right) or pd.isna(answer) or right=="" or answer==""):
|
||||
return 99
|
||||
try:
|
||||
right = float(right)
|
||||
answer = float(answer)
|
||||
except ValueError:
|
||||
return 70
|
||||
result = compare(right, answer)
|
||||
if (result):
|
||||
if (code104 != 40):
|
||||
if (answer == 450):
|
||||
return 74
|
||||
elif(answer >= 440 and answer <= 460):
|
||||
return 41
|
||||
elif (answer < 440):
|
||||
return 42
|
||||
elif (answer > 460):
|
||||
return 43
|
||||
else:
|
||||
return 40
|
||||
else:
|
||||
if (answer == 450):
|
||||
return 74
|
||||
elif(answer >= 440 and answer <= 460):
|
||||
return 71
|
||||
elif (answer < 440):
|
||||
return 72
|
||||
elif (answer > 460):
|
||||
return 73
|
||||
return 70
|
||||
|
||||
def compareFor(answer, right, formula):
|
||||
try:
|
||||
eval(formula)
|
||||
return 70
|
||||
except:
|
||||
pass
|
||||
if (pd.isna(right) or pd.isna(answer) or right=="" or answer==""):
|
||||
return 99
|
||||
if (compareUp(right, answer)):
|
||||
return 40
|
||||
return 70
|
||||
|
||||
|
||||
def compareP3(input):
|
||||
# 是否会出现序列数字不为4个的情况?
|
||||
s = str(input)
|
||||
s = s.replace(' ', '')
|
||||
if (s == ""):
|
||||
return 99
|
||||
l = s.split(',')
|
||||
if (len(l) < 4):
|
||||
return 99
|
||||
count = 0
|
||||
if (l[0] == 'B'):
|
||||
count = count + 1
|
||||
if (l[1] == 'D'):
|
||||
count= count + 1
|
||||
if (l[2] == 'A'):
|
||||
count= count + 1
|
||||
if (l[3] == 'C'):
|
||||
count= count + 1
|
||||
if (count == 0):
|
||||
return 70
|
||||
elif (count == 1):
|
||||
return 10
|
||||
elif (count == 2):
|
||||
return 20
|
||||
elif (count == 3):
|
||||
return 30
|
||||
else:
|
||||
return 40
|
||||
|
||||
def get60107(mm71, mm72, mm60105):
|
||||
if (pd.isna(mm71) or pd.isna(mm72) or mm71=="" or mm72==""):
|
||||
return 99
|
||||
if (mm60105 == 40 and mm71 =='A' and mm72 == 'D'):
|
||||
return 20
|
||||
if (mm60105 == 40 and mm71 =='A' and mm72 == 'A'):
|
||||
return 40
|
||||
if (mm60105 in [71, 41] and mm71 =='A' and mm72 == 'A'):
|
||||
return 41
|
||||
if (mm60105 in [72, 42] and mm71 =='B' and mm72 == 'D'):
|
||||
return 42
|
||||
if (mm60105 in [73, 43] and mm71 =='B' and mm72 == 'C'):
|
||||
return 43
|
||||
if (mm60105 == 74 and mm71 =='A' and mm72 == 'B'):
|
||||
return 44
|
||||
return 70
|
||||
|
||||
def get60108(answer, mm60102_CODE):
|
||||
blank = 0
|
||||
count = 0
|
||||
for i in range(4):
|
||||
if (pd.isna(answer[i]) or answer[i] == ""):
|
||||
blank = blank + 1
|
||||
if (blank == 4):
|
||||
return 99
|
||||
if (answer[0] == 'A'):
|
||||
count = count + 1
|
||||
if (answer[1] == 'C'):
|
||||
count= count + 1
|
||||
if (answer[2] == 'B'):
|
||||
count= count + 1
|
||||
if (answer[3] == 'A'):
|
||||
count= count + 1
|
||||
if (count == 2):
|
||||
return 10
|
||||
if (count == 3):
|
||||
return 20
|
||||
if (count == 4):
|
||||
return 40
|
||||
if (mm60102_CODE == 30 and answer[0] == 'A' and answer[1] == 'A' and answer[2] == 'B' and answer[3] == 'A'):
|
||||
return 41
|
||||
return 70
|
||||
|
||||
def get60311(answer, mm60331_CODE):
|
||||
if (answer == "" or pd.isna(answer)):
|
||||
return 99
|
||||
if (answer == "D"):
|
||||
return 40
|
||||
if (answer == 'A' and mm60331_CODE in [31, 35, 41, 45]):
|
||||
return 30
|
||||
if (answer == 'B' and mm60331_CODE in [32, 36, 42, 46]):
|
||||
return 31
|
||||
if (answer == 'C' and mm60331_CODE in [33, 37, 43, 47]):
|
||||
return 32
|
||||
return 70
|
||||
|
||||
def get60101(answer):
|
||||
if (answer == "" or pd.isna(answer)):
|
||||
return 99
|
||||
if (answer == "C"):
|
||||
return 40
|
||||
return 70
|
||||
|
||||
def get60321(answer):
|
||||
if (answer == "" or pd.isna(answer)):
|
||||
return 99
|
||||
if (answer == "D"):
|
||||
return 40
|
||||
if (answer == "C"):
|
||||
return 20
|
||||
return 70
|
||||
|
||||
def get60411(answer):
|
||||
if (answer == "" or pd.isna(answer)):
|
||||
return 99
|
||||
if (answer == "B"):
|
||||
return 40
|
||||
return 70
|
||||
|
||||
def get60421(answer):
|
||||
if (answer == "" or pd.isna(answer)):
|
||||
return 99
|
||||
if (answer == "C"):
|
||||
return 40
|
||||
if (answer == "B"):
|
||||
return 20
|
||||
return 70
|
||||
|
||||
def get60102(answer):
|
||||
if (answer == "" or pd.isna(answer)):
|
||||
return 99
|
||||
s = str(answer)
|
||||
s = s.replace(' ', '')
|
||||
l = s.split(',')
|
||||
right = ['A', 'D', 'E', 'H']
|
||||
right
|
||||
if (len(l) == 2 and l[0] in right and l[1] in right):
|
||||
return 20
|
||||
if (len(l) == 3 and l[0] in right and l[1] in right and l[2] in right):
|
||||
return 20
|
||||
if (len(l) == 4 and l[0] in right and l[1] in right and l[2] in right and l[3] in right):
|
||||
return 40
|
||||
if (len(l) == 5):
|
||||
if ('G' in l):
|
||||
return 30
|
||||
else:
|
||||
return 31
|
||||
return 70
|
||||
|
||||
def get60341(answer, right, formula):
|
||||
try:
|
||||
eval(formula)
|
||||
return 70
|
||||
except:
|
||||
pass
|
||||
# 空白是全部空白吗?
|
||||
count = 0
|
||||
blank = 0
|
||||
for i in range(5):
|
||||
if (pd.isna(answer[i]) or answer[i] == ""):
|
||||
blank = blank + 1
|
||||
if (blank == 5):
|
||||
return 99
|
||||
for i in range(5):
|
||||
if (pd.isna(answer[i]) or pd.isna(right[i])):
|
||||
continue
|
||||
if (compare(answer[i], right[i])):
|
||||
count = count + 1
|
||||
if (count == 0):
|
||||
return 70
|
||||
if (count == 1):
|
||||
return 71
|
||||
if (count == 2):
|
||||
return 10
|
||||
if (count == 3):
|
||||
return 20
|
||||
if (count == 4):
|
||||
return 30
|
||||
if (count == 5):
|
||||
return 40
|
||||
|
||||
def cal331(formula, f, RR, BR, JR, RS, BS, JS, GR, WR, GS, WS):
|
||||
f = str(f)
|
||||
f = f.replace('RR', str(RR))
|
||||
f = f.replace('BR', str(BR))
|
||||
f = f.replace('JR', str(JR))
|
||||
f = f.replace('RS', str(RS))
|
||||
f = f.replace('BS', str(BS))
|
||||
f = f.replace('JS', str(JS))
|
||||
f = f.replace('GR', str(GR))
|
||||
f = f.replace('WR', str(WR))
|
||||
f = f.replace('GS', str(GS))
|
||||
f = f.replace('WS', str(WS))
|
||||
f = f.replace('÷', '/')
|
||||
f = f.replace('×', '*')
|
||||
result = ""
|
||||
try:
|
||||
result = eval(f)
|
||||
print(f + "=" + str(eval(f)))
|
||||
except SyntaxError:
|
||||
file.write("SyntaxError " + str(formula) + "\n")
|
||||
return ""
|
||||
except NameError:
|
||||
file.write("NameError " + str(formula) + "\n")
|
||||
return ""
|
||||
except TypeError:
|
||||
file.write("TypeError " + str(formula) + "\n")
|
||||
return ""
|
||||
return result
|
||||
|
||||
def cal104(formula, f, E, L, V, W, T, C, G, U, D, P):
|
||||
f = str(f)
|
||||
f = f.replace('L', str(L))
|
||||
f = f.replace('V', str(V))
|
||||
f = f.replace('C', str(C))
|
||||
f = f.replace('G', str(G))
|
||||
f = f.replace('D', str(D))
|
||||
f = f.replace('P', str(P))
|
||||
f = f.replace('E', str(E))
|
||||
f = f.replace('T', str(T))
|
||||
f = f.replace('W', str(W))
|
||||
f = f.replace('U', str(U))
|
||||
f = f.replace('÷', '/')
|
||||
f = f.replace('×', '*')
|
||||
result = ""
|
||||
try:
|
||||
result = eval(f)
|
||||
print(f + "=" + str(eval(f)))
|
||||
except SyntaxError:
|
||||
file.write("SyntaxError " + str(formula) + "\n")
|
||||
return ""
|
||||
except NameError:
|
||||
file.write("NameError " + str(formula) + "\n")
|
||||
return ""
|
||||
except TypeError:
|
||||
file.write("TypeError " + str(formula) + "\n")
|
||||
return ""
|
||||
return str(result)
|
||||
|
||||
def cal431(f, Y, y, A, a, B, b):
|
||||
f = str(f)
|
||||
f = f.replace('Y', str(Y))
|
||||
f = f.replace('y', str(y))
|
||||
f = f.replace('A', str(A))
|
||||
f = f.replace('B', str(B))
|
||||
f = f.replace('a', str(a))
|
||||
f = f.replace('b', str(b))
|
||||
f = f.replace('÷', '/')
|
||||
f = f.replace('×', '*')
|
||||
result = ""
|
||||
try:
|
||||
result = eval(f)
|
||||
print(f + "=" + str(eval(f)))
|
||||
except SyntaxError:
|
||||
file.write("SyntaxError " + str(f) + "\n")
|
||||
return ""
|
||||
except NameError:
|
||||
file.write("NameError " + str(f) + "\n")
|
||||
return ""
|
||||
except TypeError:
|
||||
file.write("TypeError " + str(f) + "\n")
|
||||
return ""
|
||||
except ZeroDivisionError:
|
||||
file.write("ZeroDivisionError " + str(f) + "\n")
|
||||
return ""
|
||||
except OverflowError:
|
||||
file.write("OverflowError " + str(f) + "\n")
|
||||
return ""
|
||||
return result
|
||||
|
||||
|
||||
def get331(f):
|
||||
s = str(f)
|
||||
s = s.replace('该同学50米跑的排名', 'RR')
|
||||
s = s.replace('该同学实心球的排名', 'BR')
|
||||
s = s.replace('该同学立定跳远的排名', 'JR')
|
||||
s = s.replace('该同学50米跑的成绩', 'RS')
|
||||
s = s.replace('该同学实心球的成绩', 'BS')
|
||||
s = s.replace('该同学立定跳远的成绩', 'JS')
|
||||
s = s.replace('该同学表现最好项目的排名', 'GR')
|
||||
s = s.replace('该同学表现最差项目的排名', 'WR')
|
||||
s = s.replace('该同学表现最好项目的成绩', 'GS')
|
||||
s = s.replace('该同学表现最差项目的成绩', 'WS')
|
||||
return s
|
||||
|
||||
def get104(f):
|
||||
s = str(f)
|
||||
s = s.replace('每人每天要刷2次牙', 'E')
|
||||
s = s.replace('牙刷的长度为18厘米', 'L')
|
||||
s = s.replace('漱口杯的容量为500毫升', 'V')
|
||||
s = s.replace('水龙头1分钟会流出3升水', 'W')
|
||||
s = s.replace('每次刷牙平均需要3分钟', 'T')
|
||||
s = s.replace('刷牙时的水温为25摄氏度', 'C')
|
||||
s = s.replace('每次刷牙要使用1厘米牙膏', 'G')
|
||||
s = s.replace('每次正常刷牙使用1升水(用于漱口、冲洗牙刷等)', 'U')
|
||||
s = s.replace('每周7天', 'D')
|
||||
s = s.replace('家中包括4个成员', 'P')
|
||||
return s
|
||||
|
||||
|
||||
def get60351(answer, mm60331, l):
|
||||
max_student = l.index(max(l))
|
||||
min_student = l.index(min(l))
|
||||
max_student = ['A', 'B', 'C', 'D', 'E'][max_student]
|
||||
min_student = ['A', 'B', 'C', 'D', 'E'][min_student]
|
||||
if (answer == '' or pd.isna(answer)):
|
||||
return 99
|
||||
if (mm60331 in [10, 72] and answer == max_student):
|
||||
return 20
|
||||
if (mm60331 in [20, 38] and answer == max_student):
|
||||
return 30
|
||||
if (mm60331 in [30, 31, 32, 33, 40, 41, 42, 43] and answer == min_student):
|
||||
return 40
|
||||
if (mm60331 in [34, 35, 36, 37, 44, 45, 46, 47] and answer == max_student):
|
||||
return 41
|
||||
return 70
|
||||
|
||||
def get60461(answer):
|
||||
count = 0
|
||||
blank = 0
|
||||
right = ['B', 'A', 'A', 'B']
|
||||
for i in range(4):
|
||||
if (pd.isna(answer[i]) or answer[i] == ""):
|
||||
blank = blank + 1
|
||||
if (blank == 4):
|
||||
return 99
|
||||
for i in range(4):
|
||||
if (pd.isna(answer[i]) or pd.isna(right[i])):
|
||||
continue
|
||||
if (answer[i] == right[i]):
|
||||
count = count + 1
|
||||
if (count == 1):
|
||||
return 10
|
||||
if (count == 2):
|
||||
return 20
|
||||
if (count == 3):
|
||||
return 30
|
||||
if (count == 4):
|
||||
return 40
|
||||
return 70
|
|
@ -0,0 +1,122 @@
|
|||
# This is a sample Python script.
|
||||
|
||||
# Press ⌃R to execute it or replace it with your code.
|
||||
# Press Double ⇧ to search everywhere for classes, files, tool windows, actions, and settings.
|
||||
import os
|
||||
import pandas as pd
|
||||
filePath= "file1" #输入文件夹
|
||||
bigname=111
|
||||
# -*- coding: utf-8 -*-
|
||||
import numpy as np
|
||||
from scipy.cluster.hierarchy import dendrogram, linkage, fcluster, set_link_color_palette
|
||||
from matplotlib import pyplot as plt
|
||||
|
||||
def os_file(path): #遍历输入文件夹有多少个文件
|
||||
filenames = os.listdir(path)
|
||||
|
||||
for filename in filenames:
|
||||
print(filename)
|
||||
filename1=filename[0:-5] #删掉了文件后缀
|
||||
print(filename1)
|
||||
if filename!=".DS_Store":
|
||||
file_reader(filePath + "/" + filename,filename1)
|
||||
|
||||
|
||||
def file_reader(path,name):
|
||||
df1 = pd.read_excel(path, sheet_name="Sheet1") #输入的文件必须放在Sheet1
|
||||
df1 = np.array(df1)
|
||||
importance=df1[-1,:] #获取最后一行的所有列信息(权重)
|
||||
importance = np.delete(importance, 0, axis=0)
|
||||
df1=df1[0:-1,:] #获取除了权重行的所有数据
|
||||
#print(importance)
|
||||
|
||||
# 把word和数组分开
|
||||
word = df1[:, 0] #获取第一列所有行(词)
|
||||
data = np.delete(df1, 0, axis=1) #获取除第一列的其他列信息(打分)
|
||||
# print("data1")
|
||||
#print(data)
|
||||
data=data*importance #权重与数值相乘
|
||||
#print(word)
|
||||
#print("data2")
|
||||
#print(data)
|
||||
nums,indics=hierarchy_cluster(data,word,name)
|
||||
print(indics)
|
||||
for i in range(len(indics)):
|
||||
group = "为一组"
|
||||
for j in range(len(indics[i])):
|
||||
group=word[indics[i][j]]+" "+group
|
||||
print(group)
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
def hierarchy_cluster(data,word,name, method='complete', threshold=600.0): #complete-linkage
|
||||
'''层次聚类
|
||||
|
||||
Arguments:
|
||||
data [[0, float, ...], [float, 0, ...]] -- 文档 i 和文档 j 的距离
|
||||
|
||||
Keyword Arguments:
|
||||
method {str} -- [linkage的方式: single、complete、average、centroid、median、ward] (default: {'average'})
|
||||
threshold {float} -- 聚类簇之间的距离
|
||||
Return:
|
||||
cluster_number int -- 聚类个数
|
||||
cluster [[idx1, idx2,..], [idx3]] -- 每一类下的索引
|
||||
'''
|
||||
data = np.array(data)
|
||||
plt.figure(figsize=(10, 15), dpi=300) #代表宽和高的尺寸,dpi代表分辨率
|
||||
Z = linkage(data, method=method,metric='euclidean') #欧式距离公式
|
||||
|
||||
cluster_assignments = fcluster(Z, threshold, criterion='distance')
|
||||
|
||||
|
||||
num_clusters = cluster_assignments.max()
|
||||
|
||||
indices = get_cluster_indices(cluster_assignments)
|
||||
z = linkage(data, method='ward')
|
||||
print(z.shape)
|
||||
dendrogram(z, labels=word, color_threshold=80,orientation='right', leaf_font_size=8,above_threshold_color='black')
|
||||
set_link_color_palette(['#0000FF', '#4A766E', '#2F4F4F','871F78','FF7F00','E47833','FF6666','FFCCFF']) #color_threshold是画线位置;orientation调成left图换方向
|
||||
#n_clusters=10 #leaf_font_size词间距
|
||||
#color_threshold=25
|
||||
|
||||
plt.grid(True, which='minor', ls='--') #minor代表不显示网格线,major代表显示
|
||||
#name = "college" #设置图片标题
|
||||
plt.title(name, fontdict={'fontproperties':'Times New Roman','size': 10}) #标题的字体字号
|
||||
plt.yticks(fontproperties='Times New Roman', size=8) #设置y轴字体和字号,大小及加粗
|
||||
plt.xticks(fontproperties='Times New Roman', size=8)
|
||||
plt.plot(linewidth = '0.5') #设置线粗细
|
||||
|
||||
f = plt.gcf()
|
||||
|
||||
f.savefig(name + ".png")
|
||||
plt.show()
|
||||
f.clear()
|
||||
|
||||
return num_clusters,indices
|
||||
|
||||
|
||||
def get_cluster_indices(cluster_assignments): #层次聚类的实现函数
|
||||
'''映射每一类至原数据索引
|
||||
|
||||
Arguments:
|
||||
cluster_assignments 层次聚类后的结果
|
||||
|
||||
Returns:
|
||||
[[idx1, idx2,..], [idx3]] -- 每一类下的索引
|
||||
'''
|
||||
n = cluster_assignments.max()
|
||||
indices = []
|
||||
for cluster_number in range(1, n + 1):
|
||||
indices.append(np.where(cluster_assignments == cluster_number)[0])
|
||||
|
||||
return indices
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
os_file(filePath)
|
|
@ -0,0 +1,55 @@
|
|||
import os
|
||||
from random import shuffle
|
||||
from train import getFeature
|
||||
from drawRadar import draw
|
||||
import joblib
|
||||
import numpy as np
|
||||
import pyaudio
|
||||
import wave
|
||||
|
||||
path = r'wave'
|
||||
|
||||
wav_paths = []
|
||||
|
||||
person_dirs = os.listdir(path)
|
||||
for person in person_dirs:
|
||||
if person.endswith('txt'):
|
||||
continue
|
||||
emotion_dir_path = os.path.join(path, person)
|
||||
emotion_dirs = os.listdir(emotion_dir_path)
|
||||
for emotion_dir in emotion_dirs:
|
||||
if emotion_dir.endswith('.ini'):
|
||||
continue
|
||||
emotion_file_path = os.path.join(emotion_dir_path, emotion_dir)
|
||||
emotion_files = os.listdir(emotion_file_path)
|
||||
for file in emotion_files:
|
||||
if not file.endswith('wav'):
|
||||
continue
|
||||
wav_path = os.path.join(emotion_file_path, file)
|
||||
wav_paths.append(wav_path)
|
||||
|
||||
# 将语音文件随机排列
|
||||
#shuffle(wav_paths)
|
||||
|
||||
model = joblib.load("classfier.m")
|
||||
|
||||
p = pyaudio.PyAudio()
|
||||
for wav_path in wav_paths:
|
||||
f = wave.open(wav_path, 'rb')
|
||||
stream = p.open(
|
||||
format=p.get_format_from_width(f.getsampwidth()),
|
||||
channels=f.getnchannels(),
|
||||
rate=f.getframerate(),
|
||||
output=True)
|
||||
data = f.readframes(f.getparams()[3])
|
||||
stream.write(data)
|
||||
stream.stop_stream()
|
||||
stream.close()
|
||||
f.close()
|
||||
data_feature = getFeature(wav_path, 48)
|
||||
print(model.predict([data_feature]))
|
||||
print(model.predict_proba([data_feature]))
|
||||
labels = np.array(['angry', 'Delate', 'disgust', 'fear', 'happy','neutral','sad','surprised','TS'])
|
||||
draw(model.predict_proba([data_feature])[0], labels, 6)
|
||||
|
||||
p.terminate()
|
|
@ -0,0 +1,263 @@
|
|||
|
||||
import librosa
|
||||
import os
|
||||
from random import shuffle
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
import pandas.core.ops
|
||||
from sklearn import svm
|
||||
import joblib
|
||||
import sklearn
|
||||
import logmmse
|
||||
import wave
|
||||
|
||||
from natsort import natsorted
|
||||
import warnings
|
||||
warnings.filterwarnings('ignore')
|
||||
|
||||
path = r'trainset/casio2'
|
||||
EMOTION_LABEL = {
|
||||
'angry': '1',
|
||||
'Delate': '2',
|
||||
'disgust': '3',
|
||||
'fear': '4',
|
||||
'happy': '5',
|
||||
'neutral': '6',
|
||||
'sad':'7',
|
||||
'surprised':'8',
|
||||
'TS':'9'
|
||||
}
|
||||
|
||||
|
||||
# C:误差项惩罚参数,对误差的容忍程度。C越大,越不能容忍误差
|
||||
# gamma:选择RBF函数作为kernel,越大,支持的向量越少;越小,支持的向量越多
|
||||
# kernel: linear, poly, rbf, sigmoid, precomputed
|
||||
# decision_function_shape: ovo, ovr(default)
|
||||
#
|
||||
# #
|
||||
|
||||
'''
|
||||
这个模块包含了导入模块和svm模块
|
||||
导入模块需要librosa,始终有问题,草。
|
||||
'''
|
||||
|
||||
def getFeature(path, mfcc_feature_num=16):
|
||||
y, sr = librosa.load(path)
|
||||
|
||||
# 对于每一个音频文件提取其mfcc特征
|
||||
# y:音频时间序列;
|
||||
# n_mfcc:要返回的MFCC数量
|
||||
mfcc_feature = librosa.feature.mfcc(y, sr, n_mfcc=16)
|
||||
zcr_feature = librosa.feature.zero_crossing_rate(y)
|
||||
energy_feature = librosa.feature.rms(y)
|
||||
rms_feature = librosa.feature.rms(y)
|
||||
|
||||
mfcc_feature = mfcc_feature.T.flatten()[:mfcc_feature_num]
|
||||
zcr_feature = zcr_feature.flatten()
|
||||
energy_feature = energy_feature.flatten()
|
||||
rms_feature = rms_feature.flatten()
|
||||
|
||||
zcr_feature = np.array([np.mean(zcr_feature)])
|
||||
energy_feature = np.array([np.mean(energy_feature)])
|
||||
rms_feature = np.array([np.mean(rms_feature)])
|
||||
|
||||
data_feature = np.concatenate((mfcc_feature, zcr_feature, energy_feature,
|
||||
rms_feature))
|
||||
|
||||
return data_feature
|
||||
|
||||
|
||||
def deNoise(path):
|
||||
f = wave.open(path, "r")
|
||||
params = f.getparams()
|
||||
nchannels, sampwidth, framerate, nframes = params[:4]
|
||||
#print("nchannels:", nchannels, "sampwidth:", sampwidth, "framerate:", framerate, "nframes:", nframes)
|
||||
data = f.readframes(nframes)
|
||||
f.close()
|
||||
data = np.fromstring(data, dtype=np.short)
|
||||
|
||||
# 降噪
|
||||
data = logmmse.logmmse(data=data, sampling_rate=framerate)
|
||||
|
||||
|
||||
# 保存音频
|
||||
file_save = "save"+path
|
||||
nframes = len(data)
|
||||
f = wave.open(file_save, 'w')
|
||||
f.setparams((1, 2, framerate, nframes, 'NONE', 'NONE')) # 声道,字节数,采样频率,*,*
|
||||
# print(data)
|
||||
f.writeframes(data) # outData
|
||||
f.close()
|
||||
|
||||
def getData(mfcc_feature_num=16):
|
||||
"""找到数据集中的所有语音文件的特征以及语音的情感标签"""
|
||||
wav_file_path = []
|
||||
person_dirs = os.listdir(path)
|
||||
for person in person_dirs:
|
||||
if person.endswith('txt'):
|
||||
continue
|
||||
emotion_dir_path = os.path.join(path, person)
|
||||
emotion_dirs = os.listdir(emotion_dir_path)
|
||||
for emotion_dir in emotion_dirs:
|
||||
if emotion_dir.endswith('.ini'):
|
||||
continue
|
||||
emotion_file_path = os.path.join(emotion_dir_path, emotion_dir)
|
||||
emotion_files = os.listdir(emotion_file_path)
|
||||
for file in emotion_files:
|
||||
if not file.endswith('wav'):
|
||||
continue
|
||||
wav_path = os.path.join(emotion_file_path, file)
|
||||
wav_file_path.append(wav_path)
|
||||
|
||||
# 将语音文件随机排列
|
||||
shuffle(wav_file_path)
|
||||
data_feature = []
|
||||
data_labels = []
|
||||
|
||||
|
||||
for wav_file in wav_file_path:
|
||||
|
||||
#deNoise(wav_file)
|
||||
|
||||
data_feature.append(getFeature("save"+wav_file, mfcc_feature_num))
|
||||
data_labels.append(int(EMOTION_LABEL[wav_file.split('/')[-2]]))
|
||||
|
||||
return np.array(data_feature), np.array(data_labels)
|
||||
|
||||
|
||||
def getData1(mfcc_feature_num,path):
|
||||
"""找到数据集中的所有语音文件的特征以及语音的情感标签"""
|
||||
|
||||
|
||||
wav_file_path = []
|
||||
person_dirs = os.listdir(path)
|
||||
for person in person_dirs:
|
||||
if person.endswith('txt') :
|
||||
continue
|
||||
emotion_dir_path = os.path.join(path, person)
|
||||
emotion_dirs = os.listdir(emotion_dir_path)
|
||||
for emotion_dir in emotion_dirs:
|
||||
if emotion_dir.endswith('.ini'):
|
||||
continue
|
||||
emotion_file_path = os.path.join(emotion_dir_path, emotion_dir)
|
||||
emotion_files = os.listdir(emotion_file_path)
|
||||
emotion_files=natsorted(emotion_files)
|
||||
for file in emotion_files:
|
||||
if not file.endswith('wav'):
|
||||
continue
|
||||
wav_path = os.path.join(emotion_file_path, file)
|
||||
wav_file_path.append(wav_path)
|
||||
|
||||
# 将语音文件随机排列
|
||||
|
||||
data_feature = []
|
||||
data_labels = []
|
||||
|
||||
|
||||
for wav_file in wav_file_path:
|
||||
|
||||
data_feature.append(getFeature(wav_file, mfcc_feature_num))
|
||||
|
||||
return np.array(data_feature),wav_file_path
|
||||
|
||||
def train():
|
||||
# 使用svm进行预测
|
||||
best_acc = 0
|
||||
best_mfcc_feature_num = 0
|
||||
best_C = 0
|
||||
|
||||
for C in range(13, 20):
|
||||
for i in range(40, 55):
|
||||
data_feature, data_labels = getData(i)
|
||||
split_num = 200
|
||||
train_data = data_feature[:split_num, :]
|
||||
train_label = data_labels[:split_num]
|
||||
test_data = data_feature[split_num:, :]
|
||||
test_label = data_labels[split_num:]
|
||||
clf = svm.SVC(
|
||||
decision_function_shape='ovo',
|
||||
kernel='rbf',
|
||||
C=C,
|
||||
gamma=0.0003,
|
||||
probability=True)
|
||||
print("train start")
|
||||
clf.fit(train_data, train_label)
|
||||
print("train over")
|
||||
print(C, i)
|
||||
acc_dict = {}
|
||||
for test_x, test_y in zip(test_data, test_label):
|
||||
pre = clf.predict([test_x])[0]
|
||||
if pre in acc_dict.keys():
|
||||
continue
|
||||
acc_dict[pre] = test_y
|
||||
acc = sklearn.metrics.accuracy_score(
|
||||
clf.predict(test_data), test_label)
|
||||
if acc > best_acc:
|
||||
best_acc = acc
|
||||
best_C = C
|
||||
best_mfcc_feature_num = i
|
||||
print('best_acc', best_acc)
|
||||
print('best_C', best_C)
|
||||
print('best_mfcc_feature_num', best_mfcc_feature_num)
|
||||
print()
|
||||
|
||||
|
||||
# 保存模型
|
||||
joblib.dump(clf,
|
||||
'Models/C_' + str(C) + '_mfccNum_' + str(i) + '.m')
|
||||
|
||||
print('most_best_acc', best_acc)
|
||||
print('best_C', best_C)
|
||||
print('best_mfcc_feature_num', best_mfcc_feature_num)
|
||||
|
||||
|
||||
def getData2(path):
|
||||
data_features,wavefile = getData1(52,path)
|
||||
label=[]
|
||||
for data_feature in data_features:
|
||||
new_svm2 = joblib.load('Models/C_16_mfccNum_52.m')
|
||||
|
||||
kk=new_svm2.predict(data_feature.reshape(1,-1))
|
||||
label.append(str(kk[0]))
|
||||
|
||||
print(label)
|
||||
return label,wavefile
|
||||
|
||||
def run():
|
||||
paths = ["wav/1-1", "wav/1-2", "wav/1-5", "wav/1-7", "wav/1-14"]
|
||||
|
||||
for path in paths:
|
||||
label, wavefile = getData2(path)
|
||||
emotions = []
|
||||
for labe in label:
|
||||
if labe == "1":
|
||||
emotions.append('angry')
|
||||
elif labe == "2":
|
||||
emotions.append('Delate')
|
||||
elif labe == "3":
|
||||
emotions.append('disgust')
|
||||
elif labe == "4":
|
||||
emotions.append('fear')
|
||||
elif labe == "5":
|
||||
emotions.append('happy')
|
||||
elif labe == "6":
|
||||
emotions.append('neutral')
|
||||
elif labe == "7":
|
||||
emotions.append('sad')
|
||||
elif labe == "8":
|
||||
emotions.append('surprised')
|
||||
elif labe == "9":
|
||||
emotions.append('TS')
|
||||
|
||||
c = {"label": label, "wavefile": wavefile, "emotions": emotions}
|
||||
mySeries = pandas.DataFrame(c)
|
||||
writer = pd.ExcelWriter(path + ".xlsx") # 初始化一个writer
|
||||
mySeries.to_excel(writer, float_format='%.5f') # table输出为excel, 传入writer
|
||||
writer.save() # 保存
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
||||
train()
|
||||
|
||||
|
||||
|
Loading…
Reference in New Issue