Confused about the Query Expression blocks on the Extract Information Operator

March 12, 2018, 2:13 pm

≫ Next: Extract decision tree from Bray-curtis heatmap dendrogram

≪ Previous: changing attribute type from polynomial to Datetime makes all that attribute's value as null ("?")

I'm trying to extract word counts from a block of text. I have the Create Document Operator (where I have pasted my text) linked to the Extract Information Operator. I have the words I want tot extract (terrorist and civilian) entered into the attribute name blocks, what should I be putting in the query expression blocks? Thanks.

↧

Extract decision tree from Bray-curtis heatmap dendrogram

March 12, 2018, 6:28 pm

≫ Next: Data Appearing as Rows Instead of Attributes (Columns)

≪ Previous: Confused about the Query Expression blocks on the Extract Information Operator

I am performing microbiome study, and have already generated (using another program) a heatmap with dendrograms for clustering samples based on bacterial genus using Bray-Curtis dissimilarity, but I'd like to get the decision tree. I know RapidMiner has a decision tree model, but it must use k-means which is different from Bray-Curtis, and I want to preserve the Bray-Curtis clustering. I wonder if it's possible to load my dendrogram into RapidMiner and have it extract the Bray-Curtis decision tree? Thank you very much.

↧

Data Appearing as Rows Instead of Attributes (Columns)

March 13, 2018, 8:05 am

≫ Next: The dummy operator Windowing for Training (2) (replacing series:windowing) cannot be executed.

≪ Previous: Extract decision tree from Bray-curtis heatmap dendrogram

Hello,

I am trying to get 2 entities from a website using Xpath:

//h:h2[(@class='uvIdeaTitle')]/h:a/text()

//h:div[(@class='uvIdeaVoteCount')]/h:strong/text()

I get all of the correct data, but they appear as sequential rows, instead of separate columns under the Results tab.

I am using the following process:

Read Excel > Get Pages > Data to Documents > Process Documents (Cut Document).

How can I retrieve the data in the following structure:

URL -- Idea Title -- Vote Count

instead of

URL -- Idea Title

URL -- Vote Count

Thanks,

Blake

↧

The dummy operator Windowing for Training (2) (replacing series:windowing) cannot be executed.

March 13, 2018, 8:26 am

≫ Next: Collect Twitter user data automatically with the "Get Twitter User Details" Operator

≪ Previous: Data Appearing as Rows Instead of Attributes (Columns)

Hii..

I tried time series forecasting process.

I need to display the chart result from studio version to webservice.

I didn't find any error in my rapidminer studio version, however in rapidminer server i faced this issues.

de.rapidanalytics.ejb.service.ServiceDataSourceException: Error executing process /home/admin/processess/Property for service Property: The dummy operator Windowing for Training (2) (replacing series:windowing) cannot be executed.

So i read somewhere, it's about extensions, and i already install series forecasting but the issues remain the same.

Please help.

Thank you.

↧

Collect Twitter user data automatically with the "Get Twitter User Details" Operator

March 13, 2018, 11:52 am

≫ Next: dictionary based sentiment analysis using an own dictionary

≪ Previous: The dummy operator Windowing for Training (2) (replacing series:windowing) cannot be executed.

Hi there,

First of all, I am RapidMiner rooky – my apologies for any stupid question in advance. I have not found an answer to my question in the community, so here I go:

I would like to analyse Tweets containing keywords using the “Search Twitter” Operator and then automatically lookup the users that posted something containing the keywords. I know the “Get Twitter User Details” Operator allows me to fill in user-ids manually (one at a time) but I would like RapidMiner to take the user-id from the “Search Twitter” Operator and do it automatically. In the Youtube Tutorial “Discover Twitter Content Using RapidMiner” (https://www.youtube.com/watch?v=ia2iV5Ws3zo), posted in another query, I learnt that a Macro might be a way to do it but I have not made it yet to read the user-ids of the “Search Twitter” results with the Macro and then pass it over to the “Get Twitter User Details” Operator. I recon, I somehow need a loop to do it. I do not expect a final, clean solution but maybe you guys have a hint. The xml code should be attached for better understanding of my problem and ideas.

Thanks very much in advance!

Julian

<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001"><operator activated="true" class="social_media:search_twitter" compatibility="8.1.000" expanded="true" height="68" name="Search Twitter" width="90" x="45" y="85"><parameter key="connection" value="Twitter_DUJ"/><parameter key="query" value="Energiewende"/><parameter key="result_type" value="recent or popular"/><parameter key="limit" value="100"/><parameter key="filter_by_geo_location" value="false"/><parameter key="radius_unit" value="miles"/></operator></process><?xml version="1.0" encoding="UTF-8"?><process version="8.1.001"><operator activated="true" class="select_attributes" compatibility="8.1.001" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="85"><parameter key="attribute_filter_type" value="single"/><parameter key="attribute" value="From-User-Id"/><parameter key="attributes" value=""/><parameter key="use_except_expression" value="false"/><parameter key="value_type" value="attribute_value"/><parameter key="use_value_type_exception" value="false"/><parameter key="except_value_type" value="time"/><parameter key="block_type" value="attribute_block"/><parameter key="use_block_type_exception" value="false"/><parameter key="except_block_type" value="value_matrix_row_start"/><parameter key="invert_selection" value="false"/><parameter key="include_special_attributes" value="true"/></operator></process><?xml version="1.0" encoding="UTF-8"?><process version="8.1.001"><operator activated="true" class="set_macros" compatibility="8.1.001" expanded="true" height="124" name="Set Macros" width="90" x="313" y="85"><list key="macros"><parameter key="userid" value="#keyword1"/></list></operator></process><?xml version="1.0" encoding="UTF-8"?><process version="8.1.001"><operator activated="true" class="social_media:get_twitter_user_details" compatibility="8.1.000" expanded="true" height="68" name="Get Twitter User Details" width="90" x="380" y="391"><parameter key="connection" value="Twitter_DUJ"/><parameter key="query_type" value="name"/><parameter key="user" value="%{keyword1}"/></operator></process><?xml version="1.0" encoding="UTF-8"?><process version="8.1.001"><operator activated="true" class="select_attributes" compatibility="8.1.001" expanded="true" height="82" name="Select Attributes (2)" width="90" x="514" y="391"><parameter key="attribute_filter_type" value="subset"/><parameter key="attribute" value="From-User"/><parameter key="attributes" value="Id|Language|Location|Name|Time-Zone|Tweets"/><parameter key="use_except_expression" value="false"/><parameter key="value_type" value="attribute_value"/><parameter key="use_value_type_exception" value="false"/><parameter key="except_value_type" value="time"/><parameter key="block_type" value="attribute_block"/><parameter key="use_block_type_exception" value="false"/><parameter key="except_block_type" value="value_matrix_row_start"/><parameter key="invert_selection" value="false"/><parameter key="include_special_attributes" value="false"/></operator></process>

↧

dictionary based sentiment analysis using an own dictionary

March 14, 2018, 4:43 am

≫ Next: predict unemployment rate using neural network

≪ Previous: Collect Twitter user data automatically with the "Get Twitter User Details" Operator

Hey guys,

I'm trying to apply a dictionary based sentiment analysis using an own dictionary.

First, my dataset are Excel Files of newspaper interviews and is structured in the following way: column 1 contains the text (the text is splitted in the first question in row 1, the first answer in row 2, the second question in row 3 and so on), column 2 contains an id (interview identifier as well as if it is a question/answer and which one). I am able to manage it from retrieving excel files, processing the documents including tokenization. Afterwards I am using "Dictionary Based Sentiment" and "Apply dictionary based sentiment" operators but it is not possible to me to match dictionary (excel file with column 1 for word and column 2 for weights (1 positive and -1 negative)) and newspaper interviews.

Can you help me?

<?xml version="1.0" encoding="UTF-8"?><process version="7.6.003"><context><input/><output/><macros/></context><operator activated="true" class="process" compatibility="7.6.003" expanded="true" name="Process"><parameter key="logverbosity" value="init"/><parameter key="random_seed" value="2001"/><parameter key="send_mail" value="never"/><parameter key="notification_email" value=""/><parameter key="process_duration_for_mail" value="30"/><parameter key="encoding" value="UTF-8"/><process expanded="true"><operator activated="true" class="concurrency:loop_files" compatibility="7.6.003" expanded="true" height="82" name="Loop Files" width="90" x="45" y="34"><parameter key="directory" value="C:\Users\g21640\Desktop\ojee\bla"/><parameter key="filter_type" value="glob"/><parameter key="recursive" value="false"/><parameter key="enable_macros" value="false"/><parameter key="macro_for_file_name" value="file_name"/><parameter key="macro_for_file_type" value="file_type"/><parameter key="macro_for_folder_name" value="folder_name"/><parameter key="reuse_results" value="false"/><parameter key="enable_parallel_execution" value="true"/><process expanded="true"><operator activated="true" class="read_excel" compatibility="7.6.003" expanded="true" height="68" name="Read Excel" width="90" x="179" y="34"><parameter key="excel_file" value="C:\Users\g21640\Desktop\ojee\bla\2000_0101_2003_3112(1).xlsx"/><parameter key="sheet_number" value="1"/><parameter key="imported_cell_range" value="A1:E43"/><parameter key="encoding" value="UTF-8"/><parameter key="first_row_as_names" value="false"/><list key="annotations"><parameter key="0" value="Name"/></list><parameter key="date_format" value=""/><parameter key="time_zone" value="SYSTEM"/><parameter key="locale" value="English (United States)"/><list key="data_set_meta_data_information"><parameter key="0" value="data.false.file_path.attribute"/><parameter key="1" value="id.false.integer.attribute"/><parameter key="2" value="item.true.polynominal.attribute"/><parameter key="3" value="type.false.polynominal.attribute"/><parameter key="4" value="idid.true.polynominal.id"/></list><parameter key="read_not_matching_values_as_missings" value="true"/><parameter key="datamanagement" value="double_array"/><parameter key="data_management" value="auto"/></operator><connect from_port="file object" to_op="Read Excel" to_port="file"/><connect from_op="Read Excel" from_port="output" to_port="output 1"/><portSpacing port="source_file object" spacing="0"/><portSpacing port="source_input 1" spacing="0"/><portSpacing port="sink_output 1" spacing="0"/><portSpacing port="sink_output 2" spacing="0"/></process></operator><operator activated="true" class="append" compatibility="7.6.003" expanded="true" height="82" name="Append" width="90" x="179" y="34"><parameter key="datamanagement" value="double_array"/><parameter key="data_management" value="auto"/><parameter key="merge_type" value="all"/></operator><operator activated="true" class="nominal_to_text" compatibility="7.6.003" expanded="true" height="82" name="Nominal to Text" width="90" x="313" y="34"><parameter key="attribute_filter_type" value="single"/><parameter key="attribute" value="item"/><parameter key="attributes" value=""/><parameter key="use_except_expression" value="false"/><parameter key="value_type" value="nominal"/><parameter key="use_value_type_exception" value="false"/><parameter key="except_value_type" value="file_path"/><parameter key="block_type" value="single_value"/><parameter key="use_block_type_exception" value="false"/><parameter key="except_block_type" value="single_value"/><parameter key="invert_selection" value="false"/><parameter key="include_special_attributes" value="false"/></operator><operator activated="true" breakpoints="after" class="text:process_document_from_data" compatibility="7.5.000" expanded="true" height="82" name="Process Documents from Data (8)" width="90" x="447" y="34"><parameter key="create_word_vector" value="true"/><parameter key="vector_creation" value="Term Occurrences"/><parameter key="add_meta_information" value="true"/><parameter key="keep_text" value="true"/><parameter key="prune_method" value="none"/><parameter key="prune_below_percent" value="3.0"/><parameter key="prune_above_percent" value="30.0"/><parameter key="prune_below_rank" value="0.05"/><parameter key="prune_above_rank" value="0.95"/><parameter key="datamanagement" value="double_sparse_array"/><parameter key="data_management" value="auto"/><parameter key="select_attributes_and_weights" value="false"/><list key="specify_weights"/><process expanded="true"><operator activated="true" class="text:transform_cases" compatibility="7.5.000" expanded="true" height="68" name="Transform Cases" width="90" x="246" y="34"><parameter key="transform_to" value="lower case"/></operator><operator activated="true" class="text:tokenize" compatibility="7.5.000" expanded="true" height="68" name="Tokenize" width="90" x="447" y="34"><parameter key="mode" value="non letters"/><parameter key="characters" value=".:"/><parameter key="language" value="English"/><parameter key="max_token_length" value="3"/></operator><connect from_port="document" to_op="Transform Cases" to_port="document"/><connect from_op="Transform Cases" from_port="document" to_op="Tokenize" to_port="document"/><connect from_op="Tokenize" from_port="document" to_port="document 1"/><portSpacing port="source_document" spacing="0"/><portSpacing port="sink_document 1" spacing="0"/><portSpacing port="sink_document 2" spacing="0"/></process></operator><operator activated="true" class="set_role" compatibility="7.6.003" expanded="true" height="82" name="Set Role" width="90" x="581" y="34"><parameter key="attribute_name" value="text"/><parameter key="target_role" value="text"/><list key="set_additional_roles"><parameter key="idid" value="id"/><parameter key="text" value="text"/></list></operator><operator activated="true" breakpoints="after" class="text:data_to_documents" compatibility="7.5.000" expanded="true" height="68" name="Data to Documents" width="90" x="514" y="136"><parameter key="select_attributes_and_weights" value="true"/><list key="specify_weights"><parameter key="text" value="1.0"/></list></operator><operator activated="true" breakpoints="after" class="read_excel" compatibility="7.6.003" expanded="true" height="68" name="Read Excel (2)" width="90" x="45" y="187"><parameter key="excel_file" value="C:\Users\g21640\Desktop\Dropbox\Promotion\rapidminer listen\BPW_Wortlisten_test.xlsx"/><parameter key="sheet_number" value="1"/><parameter key="imported_cell_range" value="A1:B12350"/><parameter key="encoding" value="SYSTEM"/><parameter key="first_row_as_names" value="true"/><list key="annotations"/><parameter key="date_format" value=""/><parameter key="time_zone" value="SYSTEM"/><parameter key="locale" value="English (United States)"/><list key="data_set_meta_data_information"><parameter key="0" value="A.true.polynominal.attribute"/><parameter key="1" value="B.true.integer.attribute"/></list><parameter key="read_not_matching_values_as_missings" value="true"/><parameter key="datamanagement" value="double_array"/><parameter key="data_management" value="auto"/></operator><operator activated="true" breakpoints="after" class="operator_toolbox:dictionary_sentiment_learner" compatibility="0.9.000" expanded="true" height="82" name="Dictionary Based Sentiment" width="90" x="313" y="187"><parameter key="Value Attribute" value="B"/><parameter key="Key Attribute" value="A"/><parameter key="Negation Attribute" value=""/><parameter key="Negation Window Size" value="1"/></operator><operator activated="true" class="operator_toolbox:apply_dictionary_learner" compatibility="0.9.000" expanded="true" height="103" name="Apply Dictionary Based Sentiment (2)" width="90" x="581" y="238"/><connect from_op="Loop Files" from_port="output 1" to_op="Append" to_port="example set 1"/><connect from_op="Append" from_port="merged set" to_op="Nominal to Text" to_port="example set input"/><connect from_op="Nominal to Text" from_port="example set output" to_op="Process Documents from Data (8)" to_port="example set"/><connect from_op="Process Documents from Data (8)" from_port="example set" to_op="Set Role" to_port="example set input"/><connect from_op="Set Role" from_port="example set output" to_op="Data to Documents" to_port="example set"/><connect from_op="Data to Documents" from_port="documents" to_op="Apply Dictionary Based Sentiment (2)" to_port="doc"/><connect from_op="Read Excel (2)" from_port="output" to_op="Dictionary Based Sentiment" to_port="exa"/><connect from_op="Dictionary Based Sentiment" from_port="mod" to_op="Apply Dictionary Based Sentiment (2)" to_port="mod"/><connect from_op="Apply Dictionary Based Sentiment (2)" from_port="res" to_port="result 1"/><portSpacing port="source_input 1" spacing="0"/><portSpacing port="sink_result 1" spacing="0"/><portSpacing port="sink_result 2" spacing="0"/><description align="left" color="yellow" colored="true" height="128" resized="true" width="459" x="46" y="27">Step 1&lt;br&gt;</description></process></operator></process>

Best regards,

Daniel

↧

predict unemployment rate using neural network

March 14, 2018, 8:58 am

≫ Next: Issues with processing data and clustering operators

≪ Previous: dictionary based sentiment analysis using an own dictionary

Hi,here is my xml proces:..i wanna know if im doing this right.so many things required as i click run.i want to achieve the best neural network model of my data,performance of my model,the division set, the training and testing performance,when can i say that i already have the best model that i can alreay use for prediction,what is wrong with my data?how do i make the lables?i dont understand it clearly.and more if u can suggest..my target variable is unemployment rate, the rest are independent variables.Can anyone please help me...thank you very much

<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001"><operator activated="true" class="filter_examples" compatibility="8.1.001" expanded="true" height="103" name="Filter Examples" width="90" x="179" y="136"><parameter key="parameter_expression" value=""/><parameter key="condition_class" value="all"/><parameter key="invert_filter" value="false"/><list key="filters_list"><parameter key="filters_entry_key" value="Population.is_not_missing."/><parameter key="filters_entry_key" value="Labor force.is_not_missing."/><parameter key="filters_entry_key" value="Inflation.is_not_missing."/><parameter key="filters_entry_key" value="GDP.is_not_missing."/><parameter key="filters_entry_key" value="GNI.is_not_missing."/><parameter key="filters_entry_key" value="GDI.is_not_missing."/><parameter key="filters_entry_key" value="FOREIGN TRADE.is_not_missing."/><parameter key="filters_entry_key" value="INDUSTRY.is_not_missing."/><parameter key="filters_entry_key" value="ELEM.is_not_missing."/><parameter key="filters_entry_key" value="SECOND.is_not_missing."/><parameter key="filters_entry_key" value="HIGHERED.is_not_missing."/></list><parameter key="filters_logic_and" value="true"/><parameter key="filters_check_metadata" value="true"/></operator></process><?xml version="1.0" encoding="UTF-8"?><process version="8.1.001"><operator activated="true" class="normalize" compatibility="8.1.001" expanded="true" height="103" name="Normalize" width="90" x="179" y="238"><parameter key="return_preprocessing_model" value="false"/><parameter key="create_view" value="false"/><parameter key="attribute_filter_type" value="all"/><parameter key="attribute" value=""/><parameter key="attributes" value=""/><parameter key="use_except_expression" value="false"/><parameter key="value_type" value="numeric"/><parameter key="use_value_type_exception" value="false"/><parameter key="except_value_type" value="real"/><parameter key="block_type" value="value_series"/><parameter key="use_block_type_exception" value="false"/><parameter key="except_block_type" value="value_series_end"/><parameter key="invert_selection" value="false"/><parameter key="include_special_attributes" value="false"/><parameter key="method" value="Z-transformation"/><parameter key="min" value="0.0"/><parameter key="max" value="1.0"/><parameter key="allow_negative_values" value="false"/></operator></process><?xml version="1.0" encoding="UTF-8"?><process version="8.1.001"><operator activated="true" class="set_role" compatibility="8.1.001" expanded="true" height="82" name="Set Role" width="90" x="179" y="340"><parameter key="attribute_name" value="Unemployment"/><parameter key="target_role" value="label"/><list key="set_additional_roles"><parameter key="Population" value="regular"/><parameter key="Labor force" value="regular"/><parameter key="Inflation" value="regular"/><parameter key="GDP" value="regular"/><parameter key="GNI" value="regular"/><parameter key="GDI" value="regular"/><parameter key="FOREIGN TRADE" value="regular"/><parameter key="INDUSTRY" value="regular"/><parameter key="ELEM" value="regular"/><parameter key="SECOND" value="regular"/><parameter key="HIGHERED" value="regular"/></list></operator></process><?xml version="1.0" encoding="UTF-8"?><process version="8.1.001"><operator activated="true" class="replace_missing_values" compatibility="8.1.001" expanded="true" height="103" name="Replace Missing Values" width="90" x="313" y="238"><parameter key="return_preprocessing_model" value="false"/><parameter key="create_view" value="false"/><parameter key="attribute_filter_type" value="all"/><parameter key="attribute" value=""/><parameter key="attributes" value=""/><parameter key="use_except_expression" value="false"/><parameter key="value_type" value="attribute_value"/><parameter key="use_value_type_exception" value="false"/><parameter key="except_value_type" value="time"/><parameter key="block_type" value="attribute_block"/><parameter key="use_block_type_exception" value="false"/><parameter key="except_block_type" value="value_matrix_row_start"/><parameter key="invert_selection" value="false"/><parameter key="include_special_attributes" value="false"/><parameter key="default" value="average"/><list key="columns"/></operator></process><?xml version="1.0" encoding="UTF-8"?><process version="8.1.001"><operator activated="true" class="split_data" compatibility="8.1.001" expanded="true" height="103" name="Split Data" width="90" x="447" y="289"><enumeration key="partitions"><parameter key="ratio" value="0.9"/><parameter key="ratio" value="0.1"/></enumeration><parameter key="sampling_type" value="linear sampling"/><parameter key="use_local_random_seed" value="false"/><parameter key="local_random_seed" value="1992"/></operator></process><?xml version="1.0" encoding="UTF-8"?><process version="8.1.001"><operator activated="true" class="neural_net" compatibility="8.1.001" expanded="true" height="82" name="Neural Net" width="90" x="380" y="85"><list key="hidden_layers"/><parameter key="training_cycles" value="500"/><parameter key="learning_rate" value="0.3"/><parameter key="momentum" value="0.2"/><parameter key="decay" value="false"/><parameter key="shuffle" value="true"/><parameter key="normalize" value="true"/><parameter key="error_epsilon" value="1.0E-5"/><parameter key="use_local_random_seed" value="false"/><parameter key="local_random_seed" value="1992"/></operator></process><?xml version="1.0" encoding="UTF-8"?><process version="8.1.001"><operator activated="true" class="apply_model" compatibility="8.1.001" expanded="true" height="82" name="Apply Model" width="90" x="514" y="85"><list key="application_parameters"/><parameter key="create_view" value="false"/></operator></process><?xml version="1.0" encoding="UTF-8"?><process version="8.1.001"><operator activated="true" class="performance" compatibility="8.1.001" expanded="true" height="82" name="Performance" width="90" x="782" y="85"><parameter key="use_example_weights" value="true"/></operator></process><?xml version="1.0" encoding="UTF-8"?><process version="8.1.001"><operator activated="true" class="apply_model" compatibility="8.1.001" expanded="true" height="82" name="Apply Model (2)" width="90" x="648" y="289"><list key="application_parameters"/><parameter key="create_view" value="false"/></operator></process><?xml version="1.0" encoding="UTF-8"?><process version="8.1.001"><operator activated="true" class="performance" compatibility="8.1.001" expanded="true" height="82" name="Performance (2)" width="90" x="849" y="289"><parameter key="use_example_weights" value="true"/></operator></process><?xml version="1.0" encoding="UTF-8"?><process version="8.1.001"><operator activated="true" class="retrieve" compatibility="8.1.001" expanded="true" height="68" name="Retrieve MYDATA - Copy (2)" width="90" x="313" y="493"><parameter key="repository_entry" value="//NewLocalRepository/MYDATA - Copy"/></operator></process><?xml version="1.0" encoding="UTF-8"?><process version="8.1.001"><operator activated="true" class="filter_examples" compatibility="8.1.001" expanded="true" height="103" name="Filter Examples (2)" width="90" x="447" y="493"><parameter key="parameter_expression" value=""/><parameter key="condition_class" value="all"/><parameter key="invert_filter" value="false"/><list key="filters_list"><parameter key="filters_entry_key" value="Unemployment.is_missing."/></list><parameter key="filters_logic_and" value="true"/><parameter key="filters_check_metadata" value="true"/></operator></process><?xml version="1.0" encoding="UTF-8"?><process version="8.1.001"><operator activated="true" class="normalize" compatibility="8.1.001" expanded="true" height="103" name="Normalize (2)" width="90" x="581" y="493"><parameter key="return_preprocessing_model" value="false"/><parameter key="create_view" value="false"/><parameter key="attribute_filter_type" value="all"/><parameter key="attribute" value=""/><parameter key="attributes" value=""/><parameter key="use_except_expression" value="false"/><parameter key="value_type" value="numeric"/><parameter key="use_value_type_exception" value="false"/><parameter key="except_value_type" value="real"/><parameter key="block_type" value="value_series"/><parameter key="use_block_type_exception" value="false"/><parameter key="except_block_type" value="value_series_end"/><parameter key="invert_selection" value="false"/><parameter key="include_special_attributes" value="false"/><parameter key="method" value="Z-transformation"/><parameter key="min" value="0.0"/><parameter key="max" value="1.0"/><parameter key="allow_negative_values" value="false"/></operator></process><?xml version="1.0" encoding="UTF-8"?><process version="8.1.001"><operator activated="true" class="set_role" compatibility="8.1.001" expanded="true" height="82" name="Set Role (2)" width="90" x="715" y="493"><parameter key="attribute_name" value="Unemployment"/><parameter key="target_role" value="label"/><list key="set_additional_roles"/></operator></process><?xml version="1.0" encoding="UTF-8"?><process version="8.1.001"><operator activated="true" class="replace_missing_values" compatibility="8.1.001" expanded="true" height="103" name="Replace Missing Values (2)" width="90" x="782" y="595"><parameter key="return_preprocessing_model" value="false"/><parameter key="create_view" value="false"/><parameter key="attribute_filter_type" value="all"/><parameter key="attribute" value=""/><parameter key="attributes" value=""/><parameter key="use_except_expression" value="false"/><parameter key="value_type" value="attribute_value"/><parameter key="use_value_type_exception" value="false"/><parameter key="except_value_type" value="time"/><parameter key="block_type" value="attribute_block"/><parameter key="use_block_type_exception" value="false"/><parameter key="except_block_type" value="value_matrix_row_start"/><parameter key="invert_selection" value="false"/><parameter key="include_special_attributes" value="false"/><parameter key="default" value="average"/><list key="columns"/></operator></process><?xml version="1.0" encoding="UTF-8"?><process version="8.1.001"><operator activated="true" class="concurrency:cross_validation" compatibility="8.1.001" expanded="true" height="145" name="Cross Validation" width="90" x="916" y="442"><parameter key="split_on_batch_attribute" value="false"/><parameter key="leave_one_out" value="false"/><parameter key="number_of_folds" value="10"/><parameter key="sampling_type" value="automatic"/><parameter key="use_local_random_seed" value="false"/><parameter key="local_random_seed" value="1992"/><parameter key="enable_parallel_execution" value="true"/><process expanded="true"><operator activated="true" class="neural_net" compatibility="8.1.001" expanded="true" height="82" name="Neural Net (2)" width="90" x="112" y="85"><list key="hidden_layers"/><parameter key="training_cycles" value="500"/><parameter key="learning_rate" value="0.3"/><parameter key="momentum" value="0.2"/><parameter key="decay" value="false"/><parameter key="shuffle" value="true"/><parameter key="normalize" value="true"/><parameter key="error_epsilon" value="1.0E-5"/><parameter key="use_local_random_seed" value="false"/><parameter key="local_random_seed" value="1992"/></operator><connect from_port="training set" to_op="Neural Net (2)" to_port="training set"/><connect from_op="Neural Net (2)" from_port="model" to_port="model"/><connect from_op="Neural Net (2)" from_port="exampleSet" to_port="through 1"/><portSpacing port="source_training set" spacing="0"/><portSpacing port="sink_model" spacing="0"/><portSpacing port="sink_through 1" spacing="0"/><portSpacing port="sink_through 2" spacing="0"/></process><process expanded="true"><operator activated="true" class="apply_model" compatibility="8.1.001" expanded="true" height="82" name="Apply Model (3)" width="90" x="45" y="34"><list key="application_parameters"/><parameter key="create_view" value="false"/></operator><operator activated="true" class="performance_classification" compatibility="8.1.001" expanded="true" height="82" name="Performance from Cross Validation" width="90" x="246" y="34"><parameter key="main_criterion" value="first"/><parameter key="accuracy" value="true"/><parameter key="classification_error" value="false"/><parameter key="kappa" value="false"/><parameter key="weighted_mean_recall" value="false"/><parameter key="weighted_mean_precision" value="false"/><parameter key="spearman_rho" value="false"/><parameter key="kendall_tau" value="false"/><parameter key="absolute_error" value="false"/><parameter key="relative_error" value="false"/><parameter key="relative_error_lenient" value="false"/><parameter key="relative_error_strict" value="false"/><parameter key="normalized_absolute_error" value="false"/><parameter key="root_mean_squared_error" value="false"/><parameter key="root_relative_squared_error" value="false"/><parameter key="squared_error" value="false"/><parameter key="correlation" value="false"/><parameter key="squared_correlation" value="false"/><parameter key="cross-entropy" value="false"/><parameter key="margin" value="false"/><parameter key="soft_margin_loss" value="false"/><parameter key="logistic_loss" value="false"/><parameter key="skip_undefined_labels" value="true"/><parameter key="use_example_weights" value="true"/><list key="class_weights"/></operator><connect from_port="model" to_op="Apply Model (3)" to_port="model"/><connect from_port="test set" to_op="Apply Model (3)" to_port="unlabelled data"/><connect from_op="Apply Model (3)" from_port="labelled data" to_op="Performance from Cross Validation" to_port="labelled data"/><connect from_op="Performance from Cross Validation" from_port="performance" to_port="performance 1"/><portSpacing port="source_model" spacing="0"/><portSpacing port="source_test set" spacing="0"/><portSpacing port="source_through 1" spacing="0"/><portSpacing port="source_through 2" spacing="0"/><portSpacing port="sink_test set results" spacing="0"/><portSpacing port="sink_performance 1" spacing="0"/><portSpacing port="sink_performance 2" spacing="0"/></process></operator></process>

↧

Issues with processing data and clustering operators

March 14, 2018, 5:51 pm

≫ Next: Visualisation of Decision Trees

≪ Previous: predict unemployment rate using neural network

Hi,

I am making a project on Rapidminer for the Kaggle Walmart Customer Trip type prediction but I want to use Clustering Algorithm instead of Prediction to find the maximum and minimum sales based on days and the departments making the maximum and minimum sales. I am using the same data set used in the Kaggle competition.

I am new to data analytics and am trying to understand the operators to reach my result but I am unable to proceed ahead with the process. Please have a look at the process flow in the attachment and help me out by letting me know where am I going wrong.

Dataset: https://www.kaggle.com/c/walmart-recruiting-trip-type-classification/data

Regards,

Naman

↧

Visualisation of Decision Trees

March 14, 2018, 8:24 pm

≫ Next: [Example] Data Exploration and Time Series Analysis

≪ Previous: Issues with processing data and clustering operators

Hi,

I am in the process of finalising some decision tree outputs and I am noticing the edge lines are difficult to see (light grey), is there a way to edit the decision tree layout to use solid black lines and further edit other parts of the decision tree visually?

Cheers,

Chris

↧

[Example] Data Exploration and Time Series Analysis

March 15, 2018, 7:46 am

≫ Next: [Example] Direct Marketing with RapidMiner

≪ Previous: Visualisation of Decision Trees

Hello All,

Here is the process file for an example of data exploration and time series analysis from the data scientists at RapidMiner.

Enjoy!

↧

[Example] Direct Marketing with RapidMiner

March 15, 2018, 7:57 am

≫ Next: [Example] Employee Attrition with Deep Learning

≪ Previous: [Example] Data Exploration and Time Series Analysis

Hello All,

Here is an example model and data set of how to use RapidMiner to assist with direct marketing from our team of data scientists.

Enjoy!

↧

[Example] Employee Attrition with Deep Learning

March 15, 2018, 8:01 am

≫ Next: Finding Optimum Clustering Size

≪ Previous: [Example] Direct Marketing with RapidMiner

Hello All,

Enjoy this data set and example process from our team of data scientists here at RapidMiner for employee attrition using deep learning methods.

Cheers!

↧

Finding Optimum Clustering Size

March 15, 2018, 8:03 am

≫ Next: Histogram-Based Outlier Detection (HBOS) Scoring

≪ Previous: [Example] Employee Attrition with Deep Learning

Hello All,

Check out this data set and example process on how to find optimum clustering size from our team of data scientists here at RapidMiner.

Enjoy!

↧

Histogram-Based Outlier Detection (HBOS) Scoring

March 15, 2018, 8:09 am

≫ Next: Cross Validation with Python Models

≪ Previous: Finding Optimum Clustering Size

Hello All,

Here is a an example process and data set for scoring histogram-based outliers from our data scientists at RapidMiner.

Enjoy!

↧

Cross Validation with Python Models

March 15, 2018, 8:27 am

≫ Next: [Example] Sentiment Analysis

≪ Previous: Histogram-Based Outlier Detection (HBOS) Scoring

Hello All,

Check out these models for cross validation with Python from our RapidMiner data sceintists.

Enjoy!

↧

[Example] Sentiment Analysis

March 15, 2018, 8:58 am

≫ Next: [Example] Building Customer Data with the Tableau Integration

≪ Previous: Cross Validation with Python Models

Hello All,

Enjoy this data set and model example of sentiment analysis from our team of data scientists here at RapidMiner.

Cheers!

↧

[Example] Building Customer Data with the Tableau Integration

March 15, 2018, 9:02 am

≫ Next: Sensitivity Analysis of parameters with respect to output

≪ Previous: [Example] Sentiment Analysis

Hello All,

Here is a data set and cluster model example for building customer data with our Tableau integraiton.

Download the extension here

↧

Sensitivity Analysis of parameters with respect to output

March 16, 2018, 12:15 am

≫ Next: Facebook geo-location data / user details

≪ Previous: [Example] Building Customer Data with the Tableau Integration

Hi,

I am new to machine learning and rapidminer. I am using rapidminer to do a sensitivity analysis on my inputs with respect to a output. I have roughly 2000 inputs and I would like to find out which of them can affect or impact the output the most. I have tired using the optimize selection operator to choose the most relevant features and the list turns out to be over 1000 features which is meaningless to my analysis. If possible, is there a way to find a list of most relevant features according to their weights? I know you can use weighting operators like weight by correlation or information gain, but these operators does not take into account of inter-correlation between the parameters. Is there a machine learning method to solve this issue and if so how can I do it in rapidminer?

Thank you

↧

Facebook geo-location data / user details

March 16, 2018, 10:27 am

≫ Next: Exporting clustering analysis data

≪ Previous: Sensitivity Analysis of parameters with respect to output

Hi there,

I have read some posts regarding the RM Facebook Operators already but was not able to find out how to get geo-location data from the Facebook posts that were retrieved by the “Find Pages” and “Find Page Content” Operators. Has anybody an idea? – I understand that the Find Pages operator looks into company as well as individual (personal) profiles with the restriction that they need to be public, correct?

Do you know how to retrieve information from the user of a certain post like city he/she lives in / the company is registered on facebook? I have been able now to do it with the Twitter Operators but do not really know how to proceed with the Facebook Operators. – Btw. is the number of results limited by a total quantity or by date?

Thanks in advance!

Julian

↧

Exporting clustering analysis data

March 19, 2018, 1:32 am

≫ Next: Filter operator does not update statistics

≪ Previous: Facebook geo-location data / user details

Hi,

I have done the k-means clustering using auto model and got the results like cluster label attribute and all clustering attributes , but normalized representation. When I use write to excel operator it writes my data to file also in normalized view which I can not use to make a business decisions. Is there any way to get my initial raw data (as they were uploaded to analysis) but with the clustering labels?

Thanks, Ilona

↧